Re: [zfs-discuss] zfs, raidz, spare and jbod

Richard Elling Fri, 25 Jul 2008 18:01:45 -0700

Miles Nordin wrote:
>>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>>>>>>             
>
>     re> I will submit that this failure mode is often best
>     re> solved by door locks, not software.
>
> First, not just door locks, but:
>
>  * redundant power supplies
>
>  * sleds and Maintain Me, Please lights
>
>  * high-strung extremely conservative sysadmins who take months to do
>    small jobs and demand high salaries
>
>  * racks, pedestals, separate rooms, mains wiring diversity
>
> in short, all the costly and cumbersome things ZFS is supposed to make
> optional.
>
>


:-) I don't think it is in the ZFS design scope to change diversity...

> Secondly, from skimming the article you posted, ``did not even make
> the Other category'' in this case seems to mean the study doesn't
> consider it, not that you captured some wholistic reliability data and
> found that it didn't occur.
>   

You are correct that in the samples we collected, we had no records
of disks spontaneously falling out of the system.  The failures we
collected for this study were those not caused by service actions.

> Thirdly, as people keep saying over and over in here, the reason they
> pull drives is to simulate the kind of fails-to-spin,
> fails-to-IDENTIFY, spews garbage onto the bus drive that many of us
> have seen cause lower-end systems to do weird things.  If it didn't
> happen, we wouldn't have *SEEN* it, and wouldn't be trying to simulate
> it.  You can't make me distrust my own easily-remembered experience
> from like two months ago by plotting some bar chart.
>   

What happens when the device suddenly disappears is that the
device selection fails. This exercises a code path that is relatively
short and does the obvious.  A failure to spin exercises a very
different code path because the host can often talk to the disk,
but the disk itself is sick.

> A month ago you were telling us these tiny boards with some $10
> chinese chip that split one SATA connector into two, built into Sun's
> latest JBOD drive sleds, are worth a 500% markup on 1TB drives because
> in the real world, cables fail, controllers fail, drives spew garbage
> onto busses, therefore simple fan-out port multipliers are not good
> enough---you need this newly-conceived ghetto-multipath.  Now you're
> telling me failed controllers, cables, and drive firmware is allowed
> to lock a whole kernel because it ``doesn't even make the Other
> category.''  sorry, that does not compute.
>   

I believe the record will show that there are known bugs in
the Marvell driver which have caused this problem for SATA
drives.  In the JBOD sled case, this exact problem would not
exist because you hot-plug to SAS interfaces, not SATA
interfaces -- different controller and driver.

> I think I'm going to want a ``simulate channel A failure'' button on
> this $700 sled.  If only the sled weren't so expensive I could
> simulate it myself by sanding off the resist and scribbling over the
> traces with a pencil or something.  I basically don't trust any of it
> any more, and I'll stop pulling drives when I have a
> drive-failure-simulator I trust more than that procedure.  'zpool
> offline' is not a drive-failure-simulator---I've already established
> on my own system it's very different, and there is at least one fix
> going into b94 trying to close that gap.
>
> I'm sorry, this is just ridiculous.
>   

With parallel SCSI this was a lot easier -- we could just wire a
switch into the bus and cause stuck-at faults quite easily.  With
SAS and SATA it is more difficult because they only share
differential pairs in a point-to-point link.  There is link detection
going on all of the time which precludes testing for stuck-at
faults.  Each packet has CRCs, so in order to induce a known
bad packet for testing you'll have to write some code which
makes intentionally bad packets.  But this will only really test
the part of the controller chip which does CRC validation, which
is, again, probably not what you want.  It actually works a lot
more like Ethernet, which also has differential signalling, link
detection, and CRCs.

But if you really just want to do fault injections, then you should
look at ztest, http://opensolaris.org/os/community/zfs/ztest/
though it is really a ZFS code-path exerciser and not a Marvell
driver path exerciser.  If you want to test the Marvell code path
then you might look at project COMSTAR which will allow
you to configure another host to look like a disk and then you
can make all sorts of simulated disk faults by making unexpected
responses, borken packets, really slow responses, etc.
http://opensolaris.org/os/project/comstar/
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs, raidz, spare and jbod

Reply via email to