Miles Nordin wrote: >>>>>> "jcm" == James C McPherson <[EMAIL PROTECTED]> writes: >>>>>> "thp" == Todd H Poole <[EMAIL PROTECTED]> writes: >>>>>> "mh" == Matt Harrison <[EMAIL PROTECTED]> writes: >>>>>> "js" == John Sonnenschein <[EMAIL PROTECTED]> writes: >>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >>>>>> "cg" == Carson Gaspar <[EMAIL PROTECTED]> writes: >>>>>> > > jcm> Don't _ever_ try that sort of thing with IDE. As I mentioned > jcm> above, IDE is not designed to be able to cope with [unplugging > jcm> a cable] > > It shouldn't have to be designed for it, if there's controller > redundancy. On Linux, one drive per IDE bus (not using any ``slave'' > drives) seems like it should be enough for any electrical issue, but > is not quite good enough in my experience, when there are two PATA > busses per chip. but one hard drive per chip seems to be mostly okay. > In this SATA-based case, not even that much separation was necessary > for Linux to survive on the same hardware, but I agree with you and > haven't found that level with PATA either. > > OTOH, if the IDE drivers are written such that a confusing interaction > with one controller chip brings down the whole machine, then I expect > the IDE drivers to do better. If they don't, why advise people to buy > twice as much hardware ``because, you know, controllers can also fail, > so you should have some controller redundancy''---the advice is worse > than a waste of money, it's snake oil---a false sense of security. >
No snake oil. Pulling cables only simulates pulling cables. If you are having difficulty with cables falling out, then this problem cannot be solved with software. It *must* be solved with hardware. But the main problem with "simulating disk failures by pulling cables" is that the code paths executed during that test are different than those executed when the disk fails in other ways. It is not simply an issue of the success or failure of the test, but it is an issue of what you are testing. Studies have shown that pulled cables is not the dominant failure mode in disk populations. Bairavasundaram et.al. [1] showed that data checksum errors are much more common. In some internal Sun studies, we also see unrecoverable read as the dominant disk failure mode. ZFS will do well for these errors, regardless of the underlying OS. AFAIK, none of the traditional software logical volume managers nor the popular open source file systems (other than ZFS :-) address this problem. [1] http://www.usenix.org/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss