>>>>> "n" == Nathan <nat...@passivekid.com> writes:
n> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery This sounds silly. Does it actually work for you? It seems like comparing 7 seconds to the normal 30 seconds would be useless. Instead you want to compare (7 seconds * n levels * of cargo-cult retry in OS storage stack) to the 0.01 seconds it normally takes to read a sector. 3 orders of magnitude difference here is what makes slowly-failing drives useless, not the tiny difference between 7 and 30. A smart feature would be ``mark unreadable blocks in the drive's onboard DRAM read cache and fail them instantly without an attempt on the medium, to work around broken OS storage stacks that can't distinguish between cabling errors and drive reports and keep uselessly banging away on dead sectors as errors slowly propogate up an `abstracted' stack,'' and ``spend at most 30 seconds out of every 2000 seconds on various degraded error recovery gymnastics. If your time budget's spent, toss up an error immediately, NO THINKING, immediately after the second time the platter rotated while the head should have been over the data, no matter where the head actually was or what you got or how certain you are the data is there unharmed if you can just recover head servo.'' but I doubt the EE's are smart enough to put that feature on the table. actually it's probably not so much EE's are dumb as that they assume OS designers can implement such policies in their drivers instead of needing them pushed down to the drive. which is, you know, a pretty reasonable (albeit wrong) assumption. The most interesting thing on that wikipedia page is that freebsd geom is already using a 4-second timeout. Once you've done that, I'm not sure if it matters whether the drive signals error by sending an error packet, or signals error by sending nothing for >4 seconds---just so long as you HEAR the signal and REACT. n> Basically drives without particular TLER settings drop out of n> RAID randomly. well...I would guess they'll drop out whenever they hit a recoverable error. :) Maybe the modern drives are so crappy, this is happening so often, that it seems ``random''. With these other cards, do the drives ``go back in'' to the RAID when they start responding to commands again? n> Does this happen in ZFS? No. Any timeouts in ZFS are annoyingly based on the ``desktop'' storage stack underneath it which is unaware of redundancy and of the possibility of reading data from elsewhere in a redundant stripe rather than waiting 7, 30, or 180 seconds for it. ZFS will bang away on a slow drive for hours, bringing the whole system down with it, rather than read redundant data from elsewhere in the stripe, so you don't have to worry about drives dropping out randomly. Every last bit will be squeezed from the first place ZFS tried to read it, even if this takes years. however you will get all kinds of analysis and log data generated during those years (assuming the system stays up enough to write the logs which it probably won't: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem ) Maybe it's getting better, but there's a fundamental philosophical position of what piece of code's responsible for what sort of blocking all this IMHO.
pgpoFormZD3Wj.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss