From cdhmanning@gmail.com Mon Mar 05 19:44:54 2018 Received: from mail-wm0-f41.google.com ([74.125.82.41]) by stoneboat.default.lvansomeren.uk0.bigv.io with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1esw2T-0002bv-7I for yaffs@stoneboat.aleph1.co.uk; Mon, 05 Mar 2018 19:44:53 +0000 Received: by mail-wm0-f41.google.com with SMTP id t74so18365389wme.3 for ; Mon, 05 Mar 2018 11:44:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wE1cDmvXP+NS/qu4c9b+tv+cXEr+9+qjOxSgoh3x99I=; b=AyUVlCFWbX8BWwKXsQ12KgnuLwLy7rWHXaZUHmTAr8KgXpuIR9zOuM6cJQfxJaBk4Z SKBgNysdaq9lui2izcLLCsoWrRtQcjGs2gSsa9l+Ru06RoldOU7GLtGQsSos6Xo8GWNm siynh6l4gPjV7Y8YnHrrUePCso6l+F/gDXc7QB0lzetIxFy5a1hnfVlGHxevYSFcGF2a +vIy8JFheW0hZRGP1EaTWoAXatrCJLQKdbgBp7GmFmaTSj+EVTUu/4YmaS3piHnsowSV NRtyhuBfVolgvi6D0UENatpyANCmJBCYEL75lKj1xegOZdoxGSuwAOxaW748xOFCKloq du2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wE1cDmvXP+NS/qu4c9b+tv+cXEr+9+qjOxSgoh3x99I=; b=MM5ypUq+wcl7ZlONKyyJ02ENqtmTg2rAtIkVgI83pTdIBAiV0UEozXfGN5EmgT+OKq 9AImvfKUI0WRVWtn+vQievFqN8bDmY2isaa2J1cYUUDeBFcQjwSFs8pBihMxjYZgk1FH WTU9PAs9n8vsblmkjbarmIo/HoLVXsdRat5x1ViPphAW+Xrr7sOhCnxlheAf0drfAXcH uqZpOvTv/xvNYt6DbbfQPh7Hk3e8yMCxXTT6wdAQ5/KHX/3EmVSKORarqXmHLOhsIQEu 31x34Czc5cEF+ajRFE5rUGUl8F//4kyxB3smNp6JnfJ1D9kaBh9etjlqVTgvBZK26Mg9 sGjw== X-Gm-Message-State: APf1xPAWg4VqLEhc0JuWmi0AnTqiDIkGYmMXhq1Yp0ViJAouJSr/FXVJ T9BbAwofAKiJhlJKeVrup1lHi272HBRKkg7zfig= X-Google-Smtp-Source: AG47ELstXYYo05gke0QPCti+TAGHKHhY62Zu8ow9W2cl46FrOjlaf3n8c9Q5gTihnBYBm8LrKfcJZFUURZtolYbDNNk= X-Received: by 10.80.184.23 with SMTP id j23mr20330438ede.5.1520279088830; Mon, 05 Mar 2018 11:44:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.144.122 with HTTP; Mon, 5 Mar 2018 11:44:48 -0800 (PST) In-Reply-To: References: From: Charles Manning Date: Tue, 6 Mar 2018 08:44:48 +1300 Message-ID: To: "de Brebisson, Cyrille (Calculator Division)" Cc: "yaffs@stoneboat.aleph1.co.uk" Content-Type: multipart/alternative; boundary="f403045c144a0c9dde0566af8f4e" X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - X-Spam_report: Spam detection software, running on the system "stoneboat.default.lvansomeren.uk0.bigv.io", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: The blocks are first marked in the RAM structures for retiring. Often this cannot be done immediately because there is stil data on the blocks. Thus it is deferred to being handled as part of a garbage collection. [...] Content analysis details: (-1.7 points, 4.9 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [74.125.82.41 listed in wl.mailspike.net] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (cdhmanning[at]gmail.com) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [74.125.82.41 listed in list.dnswl.org] 0.0 HTML_MESSAGE BODY: HTML included in message -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.3 AWL AWL: Adjusted score from AWL reputation of From: address X-ACL-Warn: warn X-SA-Exim-Connect-IP: 74.125.82.41 X-SA-Exim-Mail-From: cdhmanning@gmail.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on stoneboat.default.lvansomeren.uk0.bigv.io X-Spam-Level: X-Spam-Status: No, score=-1.7 required=4.9 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=unavailable autolearn_force=no version=3.4.1 X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on stoneboat.default.lvansomeren.uk0.bigv.io) Subject: Re: [Yaffs] New to the mailing list. Have issues with bad flash reads X-BeenThere: yaffs@stoneboat.aleph1.co.uk X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion of YAFFS NAND flash filesystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2018 19:44:54 -0000 --f403045c144a0c9dde0566af8f4e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The blocks are first marked in the RAM structures for retiring. Often this cannot be done immediately because there is stil data on the blocks. Thus it is deferred to being handled as part of a garbage collection. If the RAM data is lost (eg. power cycle), then that is not a problem. This will be detected again on the next read. Have a look at the HowYaffsWorks document. That might help explain things a bit more. -- Charles On Mon, Mar 5, 2018 at 7:59 PM, de Brebisson, Cyrille (Calculator Division) wrote: > Hello, > > > > Yet another question. When/How will the block that caused a read/write > error be retired. > > The issue at hand here is the following: > > Read error occurs. Bad data is returned to the application that was > reading the data. This causes a crash. Because we are on a small, one app > RTOS, the whole system crashes. > > If yaffs has =E2=80=9Cretired=E2=80=9D the block =E2=80=9Cin it=E2=80=99s= own private structures=E2=80=9D, but not > committed this to flash, then this =E2=80=9Cknowledge=E2=80=9D will be lo= st and the block > will not be retired. > > > > Are we in this scenario? > > > > Cyrille > > > > *De :* de Brebisson, Cyrille (Calculator Division) > *Envoy=C3=A9 :* Thursday, March 01, 2018 7:09 AM > *=C3=80 :* Charles Manning > *Cc :* yaffs@stoneboat.aleph1.co.uk > *Objet :* RE: [Yaffs] New to the mailing list. Have issues with bad flash > reads > > > > Hello, > > > > I have some more questions that are popping up as I continue investigatin= g > this =E2=80=9Cflash read fails=E2=80=9D issue. > > You said that when bad read are detected, the block will be marked for > later =E2=80=9Ccleanup=E2=80=9D=E2=80=A6. > > - Do you mean that the block will be reused, or retired (marked as bad= ) > - How is this done? > - Is it done through internal YAFFS memory structures (which will not > survive a =E2=80=9Cformat through erase all blocks=E2=80=9D)? Or is th= is done through a > =E2=80=9Cbad block marking=E2=80=9D? > > In my latest test, I have 256 pages that return bad reads, in 4 different > blocks > > - but I have not seen yaffs doing any bad block markings. Is this > normal? > - When will yaffs mark block as bad? In which condition and at what > time? > > > > Thanks, > > Cyrille > > > > *De :* yaffs [mailto:yaffs-bounces@stoneboat.aleph1.co.uk > ] *De la part de* de Brebisson, > Cyrille (Calculator Division) > *E**nvoy=C3=A9 :* Thursday, February 22, 2018 7:37 AM > *=C3=80 :* Charles Manning > *Cc :* yaffs@stoneboat.aleph1.co.uk > *Objet :* Re: [Yaffs] New to the mailing list. Have issues with bad flash > reads > > > > Hello, > > > > Thank you for the info on yaffs=E2=80=99s handling of bad read. I underst= and what > you mean here. > > > > About the bad flash drivers. > > - One area where people often make mistakes is in waiting until the > data is properly ready before reading. > > I do not think that this is the issue as we are using a nand flash signal > pin to know when data is ready to read > > - Also, make sure the flash memory power supplies are good and low > ripple. > > This sounds like a much more likely root cause in our case. I will > investigate. Thanks for the pointer > > > > Thanks for the info. > > Have a good day. > > Cyrille > > > > *De :* Charles Manning [mailto:cdhmanning@gmail.com ] > > *Envoy=C3=A9 :* Wednesday, February 21, 2018 10:37 PM > *=C3=80 **:* de Brebisson, Cyrille (Calculator Division) > *Cc :* yaffs@stoneboat.aleph1.co.uk > *Objet :* Re: [Yaffs] New to the mailing list. Have issues with bad flash > reads > > > > Hello > > You really have two independent, but related issues going on here... > > > > On Thu, Feb 22, 2018 at 4:45 AM, de Brebisson, Cyrille (Calculator > Division) wrote: > > Hello, > > > > I am responsible for implementing yaffs in a small, bare meta OS system > with a 512MB, 2K pages, 64 pages block NAND Flash. > > > > Our Flash rom subsystem is sometimes failing reads. This is happing on > blocks that are supposed to be good and contains file data! Furthermore, > later reads to the same page might succeed (after a system reboot). > > > > Of course, this is wrecking havoc on our system=E2=80=99s reliability. > > > > So, a couple of questions: > > - have you ever seen that in your experience? > > > > Bad flash drivers are quite common. One area where people often make > mistakes is in waiting until the data is properly ready before reading. > > Also, make sure the flash memory power supplies are good and low ripple. > Remember that cells (and NAND flash cells in particular) are really > analogue elements and too much ripple on the power rails can cause proble= ms. > > > > > - > - when this happen, I make sure that the flash driver to yaffs > interface returns a YAFFS_FAIL, however, yaffs does seem to still retu= rn > data to the user that called the file read, incorrectly returning know= n bad > bytes at this point. Is this normal? > > Yes, there is an issue here. Yaffs can choose to either say there is an I= O > error (EIO) or it can try to give back as much data as it can. Yaffs > chooses the latter approach. > > I have some changes underway to allow Yaffs to return EIO instead. > > > > > - > - After such a bad read, yaffs does not seem to do anything on the > block like mark it as bad or something like this. Is it intended? > > It should be marked for future cleanup. > > > - > > > > I am working with a checkout of yaffs which is around 6 month old (taken > in june 2017) in a bare metal setup. > > > > I doubt there are significant changes in the last 6 months to impact this > issue. > > The major thing to do is find out why the flash reads are failing so badl= y. > > Regards > > Charles > --f403045c144a0c9dde0566af8f4e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The blocks are first marked in the RAM stru= ctures for retiring. Often this cannot be done immediately because there is= stil data on the blocks. Thus it is deferred to being handled as part of a= garbage collection.

If the RAM data is lost (eg. power cycle)= , then that is not a problem. This will be detected again on the next read.=

Have a look at the HowYaffsWorks document. That might help ex= plain things a bit more.

-- Charles


On Mon, Mar 5, 2018 at 7:59 PM= , de Brebisson, Cyrille (Calculator Division) <cyrille@hp.com> = wrote:

Hello,

=C2=A0

Yet another question. When/How will the block that c= aused a read/write error be retired.

The issue at hand here is the following:

Read error occurs. Bad data is returned to the appli= cation that was reading the data. This causes a crash. Because we are on a = small, one app RTOS, the whole system crashes.

If yaffs has =E2=80=9Cretired=E2=80=9D the block =E2= =80=9Cin it=E2=80=99s own private structures=E2=80=9D, but not committed th= is to flash, then this =E2=80=9Cknowledge=E2=80=9D will be lost and the blo= ck will not be retired.

=C2=A0

Are we in this scenario?

=C2=A0

Cyrille

=C2=A0

De=C2=A0: de Brebisson, Cyrille (Calculator D= ivision)
Envoy=C3=A9=C2=A0: Thursday, March 01, 2018 7:09 AM=
=C3=80=C2=A0: Charles Manning <cdhmanning@gmail.com>
Cc=C2=A0: yaffs@stoneboat.aleph1.co.uk
Objet=C2=A0: RE: [Yaffs] New to the mailing list. Have issues with bad flash reads

=C2=A0

Hello,

=C2=A0

I have some more questions that are popping up as I = continue investigating this =E2=80=9Cflash read fails=E2=80=9D issue.

You said that when bad read are detected, the block = will be marked for later =E2=80=9Ccleanup=E2=80=9D=E2=80=A6.<= /p>

  • Do you mean that the block will be reused, or retir= ed (marked as bad)
  • How is this do= ne?
  • Is it done through internal = YAFFS memory structures (which will not survive a =E2=80=9Cformat through e= rase all blocks=E2=80=9D)? Or is this done through a =E2=80=9Cbad block mar= king=E2=80=9D?

In my latest test, I have 256 pages that return bad = reads, in 4 different blocks

  • but I have not seen yaffs doing any bad block marki= ngs. Is this normal?
  • When will ya= ffs mark block as bad? In which condition and at what time?

=C2=A0

Thanks,

Cyrille

=C2=A0

De=C2=A0: yaffs [mailto:yaffs-bounces@sto= neboat.aleph1.co.uk] De la part de de Brebisson, Cyrille (Calculator Division)
Envoy=C3=A9=C2=A0:= Thursday, February 22, 2018 7:37 AM
=C3=80=C2=A0: Charles Manning <cdhmanning@gmail.com>
Cc=C2=A0: yaffs@stoneboat.aleph1.co.uk
Objet=C2=A0: Re: [Yaffs] New to the mailing list. Have issues with b= ad flash reads

=C2=A0

Hello,

=C2=A0

Thank you for the info on yaffs=E2=80=99s handling o= f bad read. I understand what you mean here.

=C2=A0

About the bad flash drivers.

  • One area where people often make mistakes is in wai= ting until the data is properly ready before reading.
  • I do not think that this is the issue as we are usin= g a nand flash signal pin to know when data is ready to read<= /p>

    • Also, make sure the flash memory power supplies are= good and low ripple.

    This sounds like a much more likely root cause in ou= r case. I will investigate. Thanks for the pointer

    =C2=A0

    Thanks for the info.

    Have a good day.

    Cyrille

    =C2=A0

    De=C2=A0: Charles Manning [mailto:cdhmanning@gmail.com]
    Envoy=C3=A9=C2=A0: Wednesday, February 21, 2018 10:37 PM
    =C3=80=C2=A0: de B= rebisson, Cyrille (Calculator Division) <cyrille@hp.com>
    Cc=C2=A0: yaffs@stoneboat.aleph1.co.uk
    Objet=C2=A0: Re: [Yaffs] New to the mailing list. Have issues with b= ad flash reads

    =C2=A0

    Hello

    You really have two independent, but related issues = going on here...

    =C2=A0

    On Thu, Feb 22, 2018 at 4:45 AM, de Brebisson, Cyril= le (Calculator Division) <cyrille@hp.com> wrote:

    Hello,

    =C2=A0

    I am re= sponsible for implementing yaffs in a small, bare meta OS system with a 512= MB, 2K pages, 64=C2=A0 pages block NAND Flash.

    =C2=A0<= u>

    Our Fla= sh rom subsystem is sometimes failing reads. This is happing on blocks that= are supposed to be good and contains file data! Furthermore, later reads t= o the same page might succeed (after a system reboot).

    =C2=A0<= u>

    Of cour= se, this is wrecking havoc on our system=E2=80=99s reliability.

    =C2=A0<= u>

    So, a c= ouple of questions:

    • have you ever seen that in your experience?

    =C2=A0

    Bad flash drivers are quite common. One area where p= eople often make mistakes is in waiting until the data is properly ready be= fore reading.

    Also, make sure the flash memory power supplies are = good and low ripple.=C2=A0 Remember that cells (and NAND flash cells in par= ticular) are really analogue elements and too much ripple on the power rail= s can cause problems.

    =C2=A0

    • =C2=A0
    • when this happen, I make sure that the flash driver to yaffs interface retu= rns a YAFFS_FAIL, however, yaffs does seem to still return data to the user= that called the file read, incorrectly returning known bad bytes at this p= oint. Is this normal?

    Yes, there is an issue here. Yaffs can choose to eit= her say there is an IO error (EIO) or it can try to give back as much data = as it can. Yaffs chooses the latter approach.

    I have some changes underway to allow Yaffs to retur= n EIO instead.

    =C2=A0

    • =C2=A0
    • After such a bad read, yaffs does not seem to do anything on the block like= mark it as bad or something like this. Is it intended?
    • <= /ul>

    It should be marked for future cleanup.

    • =C2=A0

    =C2=A0<= u>

    I am wo= rking with a checkout of yaffs which is around 6 month old (taken in june 2= 017) in a bare metal setup.

    =C2=A0

    I doubt there are sig= nificant changes in the last 6 months to impact this issue.

    The major thing to do= is find out why the flash reads are failing so badly.

    Regards=

    Charles


--f403045c144a0c9dde0566af8f4e--