I'm in full overthink/overresearch mode on this issue, preparatory to ordering 
disks for my OS/zfs NAS build. So bear with me. I've been reading manuals and 
code, but it's hard for me to come up to speed on a new OS quickly. 

The question(s) underlying this thread seem to be:
(1) Does zfs raidz/raidz2/etc have the same issue with long recovery times as 
RAID5? That being dropping a drive from the array because it experiences an 
error and recovery that lasts longer than the controller (zfs/OS/device driver 
stack in this case) waits for an error message?
and 
(2) Can non "raid edition" drives be set to have shorter error recovery for 
raid use?

On (1), I pick out the following answers:
==============================================
>From Miles Nordin;
n> Does this happen in ZFS?

No. Any timeouts in ZFS are annoyingly based on the ``desktop''
storage stack underneath it which is unaware of redundancy and of the
possibility of reading data from elsewhere in a redundant stripe
rather than waiting 7, 30, or 180 seconds for it. ZFS will bang away
on a slow drive for hours, bringing the whole system down with it,
rather than read redundant data from elsewhere in the stripe, so you
don't have to worry about drives dropping out randomly. Every last
bit will be squeezed from the first place ZFS tried to read it, even
if this takes years. 
==============================================
>From Darren J Moffat;
A combination of ZFS and FMA on OpenSolaris means it will recover.
Depending on many factors - not just the hard drive and its firmware -
will depend on how long the time outs actually.
==============================================
>From Erik Trimble;
The issue is excessive error recovery times INTERNAL to the hard drive.
So, worst case scenario is that ZFS marks the drive as "bad" during a
write, causing the zpool to be degraded. It's not going to lose your
data. It just may case a "premature" marking of a drive as bad.

None of this kills a RAID (ZFS, traditional SW Raid, or HW Raid). It
doesn't cause data corruption. The issue is sub-optimal disk fault
determination.
==============================================
>From Richard Relling;
For the Solaris sd(7d) driver, the default timeout is 60 seconds with
3 or 5 retries, depending on the hardware. Whether you notice this at the
application level depends on other factors: reads vs writes, etc. You can tune
this, of course, and you have access to the source.
==============================================
>From dubslick;
Are you sure about this? I thought these consumer level drives would try 
indefinitely to carry out its operation. Even Samsung's white paper on CCTL 
RAID error recovery says it could take a minute or longer
==============================================
>From Bob Friesen;
> For a complete newbie, can someone simply answer the following: will
> using non-enterprise level drives affect ZFS like it affects
> hardware RAID?
Yes.
==============================================
So from a group of knowledgeable people I get answers all the way from "no 
problem, it'll just work, may take a while though" to "...using non-enterprise 
raid drives will affect zfs just like it does hardwar raid", that being to 
unnecessarily drop out a disk, and thereby expose the array to failure from a 
second read/write fault on another disk.

Most of the votes seem to be in the "no problem" range. But beyond me trying to 
learn all the source code, is there any way to tell how it will really react? 

My issue is this: I *want* the attributes of consumer-level drives other than 
the infinite retries. I want slow spin speed for low vibration and low power 
consumption, am willing to deal with the slower transfer/access speeds to get 
it. I can pay for (but resent being forced to!) raid-rated drives, but I don't 
like the extra power consumption needed to get them to be very fast in access 
and transfers. I'm fine with whipping in a new drive when one of the existing 
ones gets flaky. I find that I may be in the curious position of being forced 
to pay twice the price and expend twice the power to get drives that have many 
features I don't want or need and don't have what I do need, except for the one 
issue which may (infrequently!) tear up whatever data I have built. ... maybe...

On question (2), I believe that my research has led to the following:
Drives which support the SMART Command Transport spec, which is many newer 
disks, appear to allow setting timeouts on read/write operations completing. 
However, this setting appears not to persist beyond a power cycle. 

Is there any good reason there can't be a driver added to the boot sequence 
that will open a file for which drives need to be SCT-set to have timeouts 
which are shorter than infinite (one of the issues from above) and also short 
enough to meet the needs of returning errors in a timely manner so that there 
is not a huge window for a second fault to corrupt a zfs array?

Forgive me if I'm being too literal here. Think of me as the town idiot asking 
questions. 8-)
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to