Hi Hunter


On Fri, Feb 10, 2017 at 8:57 AM, Hunter Somerville <hsomervi5790@gmail.com> wrote:
On Tue, Feb 7, 2017 at 3:44 PM, Charles Manning <cdhmanning@gmail.com> wrote:


On Tue, Feb 7, 2017 at 5:19 AM, Hunter Somerville <hsomervi5790@gmail.com> wrote:
Hello,

We are encountering an issue where we will usually lose an entire partition of data if the flash device loses power during a write operation. When we bring the system back up and remount, all files/directories appear as long strings of questionmarks with incorrect filenames and such, and we end up having to flash erase the partition to recover. This only happens on the device with fairly large pages (4MB Erase blocks, 32KB pages, 1KB OOB), and does not occur on the more typical device in the same system which uses 4KB pages.

What kind of flash are you using? What part number?

The hardware is proprietary, and not designed by us. What I can tell you is that we interface with an FPGA - not the flash chips directly. The FPGA performs the writes.

Surely the flash parts are off the shelf.
 
 

Obviously these devices utilize different drivers and different hardware, so the cause might have nothing to do with YAFFS2. However, while we investigate other possible causes, we felt it might be worthwhile to ask if anyone else has seen this problem with YAFFS2 or if it could feasibly be related to the comparably very large page size.

People have used Yaffs with far larger page sizes than that.

Great. This was my biggest questionmark on the Yaffs side of the problem.
 
This sounds to me like some sort of driver side issue.

What OS are you using?

Linux 3.12. Our driver uses DMA. Worth noting that we never see any problems when shutting down normally or even after an abrupt power loss when NOT actively writing. This problem only occurs when we cut power during a write operation.
 

Those questionmarks are probably 0xFF characters.
One way to debug this is to do binary dumps of the flash and look to see if they make any sense. It is often better to work with a small partition for test purposes just to reduce the amount of info you are working with (eg, maybe 20 blocks).

I tried to do this today - interestingly, after moving my testing to a smaller ~1.6GB partition lower in the devices address range, I was completely unable to make it fail. I can only cause this failure in the much larger partition. I decided instead to just use smaller files after a fresh flash erase and try to keep my work in the first 30 blocks of the large partition. This led me to discover that, indeed, the questionmarks are 0xFF characters. I did nanddumps before the power loss, after the power loss but before remounting, and after remounting. The before/after dumps were identical, but the nanddump after mounting showed that the first 30 blocks had all been erased. I'm investigating the cause of this.

Thank you for your assistance,
Hunter Somerville