Bonjour Scott, Le Fri, 20 Mar 2015 17:41:11 -0500, Scott Wood <scottw...@freescale.com> a écrit :
> On Fri, 2015-03-20 at 10:35 +0100, Albert ARIBAUD wrote: > > Hi Scott, > > > > Le Thu, 19 Mar 2015 16:39:42 -0500, Scott Wood > > <scottw...@freescale.com> a écrit : > > > > > On Wed, 2015-03-18 at 10:04 +0100, Albert ARIBAUD (3ADEV) wrote: > > > > +int nand_spl_load_image(uint32_t offs, unsigned int size, void *dst) > > > > +{ > > > > + int block_good; > > > > > > bool? > > > > > > > + struct lpc32xx_oob oob; > > > > + unsigned int page, left; > > > > + > > > > + /* if starting mid-block consider block good */ > > > > + block_good = 1; > > > > + > > > > + for (page = offs / BYTES_PER_PAGE, > > > > + left = DIV_ROUND_UP(size, BYTES_PER_PAGE); > > > > + left && (page < PAGES_PER_CHIP_MAX); page++) { > > > > + /* read inband and oob page data anyway */ > > > > + int res = read_single_page(dst, page, &oob); > > > > + /* return error if a good block's page was not readable > > > > */ > > > > + if ((res < 0) && block_good) > > > > + return -1; > > > > > > Why even try to read it if it's been marked bad? > > > > Actually, at this point, we may not know yet if the block is good or > > bad, since we might be reading a page of it for the first time. Will > > fix, see below. > > > > > > + /* if first page in a block, update 'block good' status > > > > */ > > > > > > The non-SPL driver appears to be checking the last two pages in the > > > block, not the first page. > > > > Thanks for pointing out this discrepancy. > > > > I took the first page option here after checking the NAND's datasheet, > > which says that for factory bad block marking at least, while all pages > > in a bad block /should/ bear the bad block marker, only the first one is > > /guaranteed/ to have it. The driver might have inherited the 'last page' > > option from the previous NAND chip used, or from a mistake of my own. > > Are there any plans for this controller to be used with different chips? Not that I know; but in the same vein, the current LPC32XX maintainer did not consider there were any plans for a few devices in it, and then came my patch series :) so I cannot rule out that someday a new LPC32XX board pops up in U-Boot with another NAND device. > > BTW, is there a standard way to ask a NAND chip which page(s) in a bad > > block should be checked? > > Yes and no: > http://lists.infradead.org/pipermail/linux-mtd/2011-November/038419.html ... right. How about this: the driver expects a driver-specific configuration option which tells it which page(s) in a block, and which byte in a page's OOB area, should be scanned for bad block markers, and "my" board provides a value for said option. This way, when the driver is used with a new NAND chip in another board, it can be configured for this board's specific case. > > > > + if ((page % PAGES_PER_BLOCK) == 0) { > > > > + /* assume block is good */ > > > > + block_good = 1; > > > > + /* if page could not be read, assume block is > > > > bad */ > > > > + if (res < 0) > > > > + block_good = 0; > > > > > > No. A block is to be skipped if and only if it has a bad block marker. > > > ECC errors should not be silently ignored just because they happen on > > > the first page of a block. If the writer of this data didn't skip the > > > block, then skipping it when reading will result in unusable data > > > regardless of the underlying ECC problem. > > > > You're correct of course. > > > > However, I do want the SPL chainloading to be as resilient as > > possible, and we have established that it will have to read some part > > of a bad block -- possibly resulting in a read failure -- before > > knowing that the block is bad, which means it may have to finally > > ignore the read failure. > > FWIW, the eLBC/IFC SPL functions have the same problem regarding halting > on an ECC error before checking the bad block status. Instead it should > return an error code, and let the caller halt after it's sure it's not > marked bad. Maybe we could generalize an SPL chainloading common/spl/spl.c routine and have boards /SoCs only provide the hardware-specific page/OOB read function(s) ? Honestly, that would be way beyond my scope for this patch series. > > I guess I could wait to declare the block good or bad until I could > > read at least one if its pages (actually, one of its pages' OOB data) > > properly, only bail out if the OOB says the block is supposed to be > > good. > > > > Now, if none of the block's pages could be read, this still prevents us > > from knowing whether these read failures are catastrophic. > > > > If the block was actually bad and skipped, then the resulting image > > might be intact, will pass the checksum test, and will run; if the > > block was actually good, SPL will detect it and not boot the corrupt > > image -- except if the completely unreadable good block was the first > > one, which holds the signature and checksum, in which case SPL will > > fall back to considering a raw, unsigned, unverifiable image, and will > > jump into corrupt code. > > > > This may happen; but I think the probability of having a completely > > unreadable sector is very low, and the probability of this sector being > > the first one of the image is very low too [1]. > > > > Which leads me to two possible courses of action: > > > > - consider the risk even though the probabilities are very low, thus > > consider any totally unreadable sector as good, and bail out in this > > case, or > > I was thinking instead to distinguish between a hard failure where you > got no data from the NAND (e.g. a timeout), versus an ECC failure. In > the later case you can still look in the OOB for a bad block marker. > > > - add an option to SPL that prevents common/spl/spl.c from falling back > > to running the image as raw if no signature is found, and consider > > any totally unreadable sector as bad and therefore ignore the read > > errors. Either the sector was actually good, and SPL will catch the > > image as corrupt or missing a signature, or the sector was actually > > bad, and the image will run as expected. > > ...but this seems reasonable as well (it wouldn't work for eLBC/IFC > since they need to support tiny SPLs that can only handle raw images). This leads me to a half-OT question: so those SPL, while too tiny to handle non-raw images, still do include the whole common/spl/spl.c and thus contain code which will never apply to them? Then maybe we should have /two/ options, one enabling raw images, and one enabling 'non-raw' images (and at least one enabled for any SPL), so that each SPL could choose what code it wants in. Again, I am not offering to do that in this patch series. > If you want to do this, just put a comment in explaining why you're > skipping in that situation. Ok -- I will go with the 'option' method then, and add a (lengthy, I'm afraid) comment in common/spl/spl.c explaining that skipping the raw image handling is required when chainloading from NAND and the NAND driver considers totally unreadable sectors bad, in order not to confuse a missing image header with a raw image. > -Scott Thanks for the feedback! Cordialement, Albert ARIBAUD 3ADEV _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot