>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:

    re> I will submit that this failure mode is often best
    re> solved by door locks, not software.

First, not just door locks, but:

 * redundant power supplies

 * sleds and Maintain Me, Please lights

 * high-strung extremely conservative sysadmins who take months to do
   small jobs and demand high salaries

 * racks, pedestals, separate rooms, mains wiring diversity

in short, all the costly and cumbersome things ZFS is supposed to make
optional.

Secondly, from skimming the article you posted, ``did not even make
the Other category'' in this case seems to mean the study doesn't
consider it, not that you captured some wholistic reliability data and
found that it didn't occur.

Thirdly, as people keep saying over and over in here, the reason they
pull drives is to simulate the kind of fails-to-spin,
fails-to-IDENTIFY, spews garbage onto the bus drive that many of us
have seen cause lower-end systems to do weird things.  If it didn't
happen, we wouldn't have *SEEN* it, and wouldn't be trying to simulate
it.  You can't make me distrust my own easily-remembered experience
from like two months ago by plotting some bar chart.

A month ago you were telling us these tiny boards with some $10
chinese chip that split one SATA connector into two, built into Sun's
latest JBOD drive sleds, are worth a 500% markup on 1TB drives because
in the real world, cables fail, controllers fail, drives spew garbage
onto busses, therefore simple fan-out port multipliers are not good
enough---you need this newly-conceived ghetto-multipath.  Now you're
telling me failed controllers, cables, and drive firmware is allowed
to lock a whole kernel because it ``doesn't even make the Other
category.''  sorry, that does not compute.

I think I'm going to want a ``simulate channel A failure'' button on
this $700 sled.  If only the sled weren't so expensive I could
simulate it myself by sanding off the resist and scribbling over the
traces with a pencil or something.  I basically don't trust any of it
any more, and I'll stop pulling drives when I have a
drive-failure-simulator I trust more than that procedure.  'zpool
offline' is not a drive-failure-simulator---I've already established
on my own system it's very different, and there is at least one fix
going into b94 trying to close that gap.

I'm sorry, this is just ridiculous.

Attachment: pgpjKNsaiKMje.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to