>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
re> I will submit that this failure mode is often best re> solved by door locks, not software. First, not just door locks, but: * redundant power supplies * sleds and Maintain Me, Please lights * high-strung extremely conservative sysadmins who take months to do small jobs and demand high salaries * racks, pedestals, separate rooms, mains wiring diversity in short, all the costly and cumbersome things ZFS is supposed to make optional. Secondly, from skimming the article you posted, ``did not even make the Other category'' in this case seems to mean the study doesn't consider it, not that you captured some wholistic reliability data and found that it didn't occur. Thirdly, as people keep saying over and over in here, the reason they pull drives is to simulate the kind of fails-to-spin, fails-to-IDENTIFY, spews garbage onto the bus drive that many of us have seen cause lower-end systems to do weird things. If it didn't happen, we wouldn't have *SEEN* it, and wouldn't be trying to simulate it. You can't make me distrust my own easily-remembered experience from like two months ago by plotting some bar chart. A month ago you were telling us these tiny boards with some $10 chinese chip that split one SATA connector into two, built into Sun's latest JBOD drive sleds, are worth a 500% markup on 1TB drives because in the real world, cables fail, controllers fail, drives spew garbage onto busses, therefore simple fan-out port multipliers are not good enough---you need this newly-conceived ghetto-multipath. Now you're telling me failed controllers, cables, and drive firmware is allowed to lock a whole kernel because it ``doesn't even make the Other category.'' sorry, that does not compute. I think I'm going to want a ``simulate channel A failure'' button on this $700 sled. If only the sled weren't so expensive I could simulate it myself by sanding off the resist and scribbling over the traces with a pencil or something. I basically don't trust any of it any more, and I'll stop pulling drives when I have a drive-failure-simulator I trust more than that procedure. 'zpool offline' is not a drive-failure-simulator---I've already established on my own system it's very different, and there is at least one fix going into b94 trying to close that gap. I'm sorry, this is just ridiculous.
pgpjKNsaiKMje.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss