Bug#776192: mptsas probe failure and crash, probably related to udev timeout

Stephen Dowdy Wed, 06 May 2015 15:39:53 -0700

This reply is purely to assist any other poor souls like myself who have to
suffer the fallout from this problem with a possible non-code-hack-related
workaround for *SOME* configurations.


We have two mptsas desktop systems one of which failed to boot after Jessie
FAI install.   The other had been running for weeks w/o incident.

working system:  Dell Precision T5400, SAS 6/iR in RAID1 with 2 Seagate
SATA disks (ST3500641AS & ST3500418AS)
nonworking system:  Dell Precision T3500, SAS 6/iR with RAID1 using 2
Fujitsu SAS drives (MBA3147RC) and one pass-thru Hitachi SAS drive
(HUS154545VLS300).

It was discovered that that working system had the drives connected to the
high-port connector (ports 4-7), while the non-working system had them
more-obviously connected to the low-port connector (ports 0-3).   After
moving the cabling to the high-ports on the non-working system, it now
boots.  It still takes a LONG time to probe the disks (maybe 20+ secs), but
it's apparently within limits.   The other working system doesn't pause
during probe at all, so this may be a SATA vs SAS thing or a
drive-vendor/model thing, too..   The only fallout (possibly unrelated) is
that a shutdown results in a hang after the final shutdown message
requiring a manual power-off (haven't done other tests, could be a fluke).
We used to have that issue on Precision 390/490/690 systems and had some
success with kernel bootline 'reboot=bios' (IIRC).  We haven't installed
any other T3500s yet, so maybe this is common?

This unfortunately, isn't a solution for the Integrated MPTSAS cards in
Dell's 1U 9th gen servers (e.g. PE 1950)  while the card guts appear the
same (just without the L-bracket) -- those systems' BIOS throw an error if
you attach the internal drive cabling to the high-port connector.  (it
throws something like "Invalid Disk Configuration")  That seems like some
arbitrary BIOS constraint Dell put in.  (this is from memory of trying to
do this config when i had a too-short lead cable to the drives that it
wouldn't reach the low-port connector in the past)

It's also probably not a solution for folks who have > 4 drives.   I'm also
not sure if i added another drive if it'd suddenly stop working again.
But, for anyone else out there with a similar configuration, this may save
you.

IMHO -- this problem is going to hit a lot of people -- it affects at least
50 of my machines (i actually have several hundred PowerEdge server systems
with SAS 5/6 controllers, but most of those are running RHEL5/6) -- I think
it is truly absurd that the systemd devs refuse to provide a bootline
option to increase the event timeout or even to increase the hardcode value
for a limited time.   The decree that "30 seconds is absolute and
long-enough" reminds me of such lack of insight as "noone will ever need
more than 640K of RAM".   It is simple arrogance to pull a random number
and refuse to budge on it knowing full well that changing that number ever
so slightly would enable a lot more systems to run, regardless of the bugs
in the mptsas code.   Then you can revisit the "problem" after the mptsas
code gets fixed (and hopefully that actually happens soon).   Really, how
often is 30 vs 60 vs 180 seconds going to make over the entire planet
anyway?   If something is truly really busted, you'll find out about it.
having an arbitrary 30 second "limitation" isn't going to help those people
with real busted hardware/kernelmods -- it's just hurting an audience of
people stuck with non-optimal drivers/hardware that were working perfectly
well (enough) prior to these changes.

thanks
--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu        -  http://www.ral.ucar.edu/~sdowdy/

Bug#776192: mptsas probe failure and crash, probably related to udev timeout

Reply via email to