Re: CCD on /

Nick Holland Thu, 16 Jun 2005 20:16:15 -0700

L. V. Lammert wrote:
> At 08:31 PM 6/16/2005 +0100, Niall O'Higgins wrote:
>>Controllers don't tend to like it. Sometimes with disk failure, the
>>controller will fail too!
> 
> The ASUS A7V880 runs just fine with one disk dead - infant mortality a few 
> months ago.
> 
>          Lee


One example does not make it always so.

Some people expect RAID (of either HW or SW kind) to keep them running
through a disk failure...  Some have more experience.

Designing systems that work through failures is not trivial.  The way
devices fail in the real world is very different than the way you expect
them to fail, and rarely can you get a device to fail while you are
watching everything you need to to watch to fix a problem once
discovered.  If you do get a real-world failure which produces a
problem, you try to fix it, but you will probably never know how well
you fixed it, because it will never fail in exactly the same way again.
 If you try to manufacture defective drives (i.e., spike 'em with a
powder-actuated nail gun while they are spinning), you will rack up a
lot of money rapidly (at least for a volunteer project) (but it IS fun!).

So, yes, I'm saying there are probably bugs in how HW failures are
handled in OpenBSD...and probably most other OSs.  It just isn't
something you can test effectively, but only refine it over years of
(bitter) experience.

I've always told people RAID is part of a rapid-repair solution, not
part of a "never goes down".  It *may* not go down.  Maybe, probably
won't go down.  But don't bet your career on it.  Plan for the worst
case, and things will always look better than expected.  And you look
like a genius. :)

Nick.

Re: CCD on /

Reply via email to