On Thu, Jan 18, 2007 at 03:08:46PM +0100, Olivier Galibert wrote: > On Mon, Jan 15, 2007 at 11:14:52PM +0000, Alan wrote: > > > Both smart and the internal blade diagnostics say "everything is a-ok > > > with the drive, there hasn't been any error ever except a bunch of > > > corrected ECC ones, and no more than with a similar drive in another > > > working blade". Hence my initial post. "Hardware error" is kinda > > > imprecise, so I was wondering whether it was unexpected controller > > > answer, detected transmission error, block write error, sector not > > > found... Is there a way to have more information? > > > > Well the right place to look would indeed have been the SMART data > > providing the drive didn't get into a state it couldn't update it. > > Hardware error comes from the drive deciding something is wrong (or a > > raid card faking it I guess). That covers everything from power > > fluctuations and overheating through firmware consistency failures and > > more. > > > > If you pull the drive and test it in another box does it show the same ? > > Ok, inverted the disks, got a crash of the same blade with the new > disk, so the problem is not the drive itself. Gonna try inverting two > blades to check if it's the power supply connector/rail.
...and it is the power supply/connector. Failure is linked to the position of the blade in the box (as in the blade in the first position always fails). Now that's a cute failure. Having the support act on it is going to be fun. OG. PS: Yes, I did forget to send that email :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/