I'm in full overthink/overresearch mode on this issue, preparatory to ordering disks for my OS/zfs NAS build. So bear with me. I've been reading manuals and code, but it's hard for me to come up to speed on a new OS quickly.
The question(s) underlying this thread seem to be: (1) Does zfs raidz/raidz2/etc have the same issue with long recovery times as RAID5? That being dropping a drive from the array because it experiences an error and recovery that lasts longer than the controller (zfs/OS/device driver stack in this case) waits for an error message? and (2) Can non "raid edition" drives be set to have shorter error recovery for raid use? On (1), I pick out the following answers: ============================================== >From Miles Nordin; n> Does this happen in ZFS? No. Any timeouts in ZFS are annoyingly based on the ``desktop'' storage stack underneath it which is unaware of redundancy and of the possibility of reading data from elsewhere in a redundant stripe rather than waiting 7, 30, or 180 seconds for it. ZFS will bang away on a slow drive for hours, bringing the whole system down with it, rather than read redundant data from elsewhere in the stripe, so you don't have to worry about drives dropping out randomly. Every last bit will be squeezed from the first place ZFS tried to read it, even if this takes years. ============================================== >From Darren J Moffat; A combination of ZFS and FMA on OpenSolaris means it will recover. Depending on many factors - not just the hard drive and its firmware - will depend on how long the time outs actually. ============================================== >From Erik Trimble; The issue is excessive error recovery times INTERNAL to the hard drive. So, worst case scenario is that ZFS marks the drive as "bad" during a write, causing the zpool to be degraded. It's not going to lose your data. It just may case a "premature" marking of a drive as bad. None of this kills a RAID (ZFS, traditional SW Raid, or HW Raid). It doesn't cause data corruption. The issue is sub-optimal disk fault determination. ============================================== >From Richard Relling; For the Solaris sd(7d) driver, the default timeout is 60 seconds with 3 or 5 retries, depending on the hardware. Whether you notice this at the application level depends on other factors: reads vs writes, etc. You can tune this, of course, and you have access to the source. ============================================== >From dubslick; Are you sure about this? I thought these consumer level drives would try indefinitely to carry out its operation. Even Samsung's white paper on CCTL RAID error recovery says it could take a minute or longer ============================================== >From Bob Friesen; > For a complete newbie, can someone simply answer the following: will > using non-enterprise level drives affect ZFS like it affects > hardware RAID? Yes. ============================================== So from a group of knowledgeable people I get answers all the way from "no problem, it'll just work, may take a while though" to "...using non-enterprise raid drives will affect zfs just like it does hardwar raid", that being to unnecessarily drop out a disk, and thereby expose the array to failure from a second read/write fault on another disk. Most of the votes seem to be in the "no problem" range. But beyond me trying to learn all the source code, is there any way to tell how it will really react? My issue is this: I *want* the attributes of consumer-level drives other than the infinite retries. I want slow spin speed for low vibration and low power consumption, am willing to deal with the slower transfer/access speeds to get it. I can pay for (but resent being forced to!) raid-rated drives, but I don't like the extra power consumption needed to get them to be very fast in access and transfers. I'm fine with whipping in a new drive when one of the existing ones gets flaky. I find that I may be in the curious position of being forced to pay twice the price and expend twice the power to get drives that have many features I don't want or need and don't have what I do need, except for the one issue which may (infrequently!) tear up whatever data I have built. ... maybe... On question (2), I believe that my research has led to the following: Drives which support the SMART Command Transport spec, which is many newer disks, appear to allow setting timeouts on read/write operations completing. However, this setting appears not to persist beyond a power cycle. Is there any good reason there can't be a driver added to the boot sequence that will open a file for which drives need to be SCT-set to have timeouts which are shorter than infinite (one of the issues from above) and also short enough to meet the needs of returning errors in a timely manner so that there is not a huge window for a second fault to corrupt a zfs array? Forgive me if I'm being too literal here. Think of me as the town idiot asking questions. 8-) -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss