[Yaffs] cvs YAFFS + MTD cvs + 2.4.27-vrs1 problems

Aras Vaichas arasv@magellan-technology.com
Thu, 09 Dec 2004 17:55:03 +1100


Aras Vaichas wrote:
> Hello all,
> 
> I have managed to get the latest MTD code to compile with a 2.4.27-vrs1 
> kernel. I compiled the latest YAFFS code (from CVS) into the kernel. I 
> mounted a 16MB block of Smartmedia NAND and tried to create some files, 
> I got a load of errors , I reset the machine because it was obviously 
> going crazy. I then noticed that my NAND suddenly had a load of "bad 
> blocks"

It's been a long journey, but I'm almost there.

I've finally nailed down a solution to what was causing the occasional bit flip 
and thus all the lost/bad-pages on my NAND.

If I comment out "#USE_NANDECC = -DCONFIG_YAFFS_USE_NANDECC" in my YAFFS 
Makefile, then my files are read back correctly. The only thing that happens 
now is that I get that "Reading data from NAND FLASH without ECC is not 
recommended" warning message BUT my system appears to be working.

I am guessing that the ECC stuff is being handled twice or it is being 
mishandled somewhere when I define CONFIG_YAFFS_USE_NANDECC. Comments?

I am now able to copy, say, a 15MB file from NAND without any problems, and 
without losing massive numbers of blocks due to incorrectly labelled bad blocks.

Thank you very much to those that helped me get this far, especially Thomas and 
Charles.

regards,

Aras Vaichas


------- test details ------

I designed a test to see if multiple readings of the same file from NAND would 
produce bit errors in different locations in the copied file. This was found to 
be true, and therefore something was wrong with the reading and processing of 
the data from the Flash.

For example with CONFIG_YAFFS_USE_NANDECC defined I run the same test twice and 
get two different results:

/root # ll /mnt/y1
drw-rw-rw-    1 root     root          512 Dec  9 14:57 ./
drwxrwxr-x   15 563      100          4096 Dec  6 11:51 ../
drw-rw-rw-    1 root     root          512 Dec  9 14:57 lost+found/
-rw-rw-r--    1 root     root     15728640 Dec  9 12:00 random.copy.bin

/root # ./readbacktest.sh /mnt/y1
copying random.copy.bin from FLASH to local ...
converting with hexdump ...
comparing copy to original ...
19096301  65  61
... 11 errors in total ...
44852526 146 142

/root # ./readbacktest.sh /mnt/y1
copying random.copy.bin from FLASH to local ...
converting with hexdump ...
comparing copy to original ...
13223325 142  63
... 14 errors in total ...
41558756  64  66

The first column is the offset of the difference between original and copy, the 
second and third columns are the octal value (I'm using cmp in Busybox). This 
test shows that about 1 bit in 10 million is corrupt after a read, and you can 
see that the location of the bit flip is not the same between reads, therefore 
something caused by the hardware is creating this problem and it isn't being 
fixed correctly in software.

If I don't include CONFIG_YAFFS_USE_NANDECC you will see that I don't get any 
errors (after reading back a 15MB file) but I do get warnings from nand_base.c

/root # ./readbacktest.sh /mnt/y2
copying random.copy.bin from FLASH to local ...
Reading data from NAND FLASH without ECC is not recommended
... SNIP ...
Reading data from NAND FLASH without ECC is not recommended
converting with hexdump ...
comparing copy to original ...
no differences found
/root #