I have been fighting the urge of replying to this because I know I will
get some yellins but here goes.

The described scenario is reproducible on PERC cards when operating in
a noisy environment.  Bad signal integrity will always get you and
eventually knock some drives offline.  The described mechanism of
recreating the RAID set by entering the exact values as before and not
initializing it is the best trick to recover from multiple failures.  If
only one drive fails let the rebuild take care of it because you get the
added bonus of reading ALL disks and therefore remapping any sectors
that might be going bad.

One of the worst offenders I have seen with my own eyes was a supermicro
that has a U320 backplane inside the chassis that was connected from a
riser with a cable that was too short (yes there is such a thing in the
SCSI spec).  So lets recap; chip on motherboard gets routed over quite a
distance to a riser card, from the riser card it gets routed on the
riser board (I didn't get to look at the routing on that one but I
wonder how well that was done considering the rest of the routing work
that I saw) and then through a plastic connector to a cable that was
roughly 5cm (at least 5cm too short) to a backplane where the drives
plugged into.  Again eyeing the other pieces in the chain I am not so
sure about the quality of that one too.  I was honestly horrified after
seeing that.  Just about every rule was broken on obtaining clean
signaling.  U320 took a few months to define and then 3 years to get the
signaling right.  Most of the development time in the U320 stack was
spent on getting signaling to work right.  This is one of the reasons
U640 was abandoned; it simply would have been too expensive to go trough
the same hoopla to get marginal speed gains on a parallel bus and hence
SAS became the next SCSI version.  The good news here is that the
industry did seem to have learned a lesson.  No more plastic cables and
connectors that wiggle, no more cables that can't be bent at a normal
radius, fully Faraday caged cables from connector to connector etc.  SAS
has a very good signaling package and is superior to parallel SCSI.

And then they decided to plug SATA drives into SAS cards...

Which means plastic and wiggles for everyone!  History is repeating
itself and manufacturers have been scrambling to create interposers that
remove the SATA signaling unknowns.  Interposers are those little boards
that plug directly into a SATA drive and then are plugged into a
backplane of sorts.  This is done to extend the distance a SATA drive
can talk (driver strength isn't spec'd right resulting in a lose
interpretation by vendors), filter the noise out of the connection, use
a quality shielded SAS cable instead of a crappy SATA one, provide 2
paths to the same disk (SAS disks have 2 paths by default) and a few
more minor reasons.  The idea is to remove most of the SATA badness
(read cheapness) and stabilize the whole environment.  Over time this
has become better and better however there is only so much one can graft
onto a deliberately cheap device.

Reply via email to