On Mon, 15 Jan 2007, Olivier Galibert wrote: > On Mon, Jan 15, 2007 at 06:45:40PM +0000, Alan wrote: >> On Mon, 15 Jan 2007 18:16:02 +0100 >> Olivier Galibert <[EMAIL PROTECTED]> wrote: >> >>> sd 0:0:0:0: SCSI error: return code = 0x08000002 >>> sda: Current: sense key: Hardware Error >>> ASC=0x42 ASCQ=0x0 >> >> I'll give you a clue: The words "Hardware Error". >> >> Run a SCSI verify pass on the drive with some drive utilities and see >> what happens. If you are lucky it'll just reallocate blocks and decide >> the drive is ok, if not well see what the smart data thinks. > > Both smart and the internal blade diagnostics say "everything is a-ok > with the drive, there hasn't been any error ever except a bunch of > corrected ECC ones, and no more than with a similar drive in another > working blade". Hence my initial post. "Hardware error" is kinda > imprecise, so I was wondering whether it was unexpected controller > answer, detected transmission error, block write error, sector not > found... Is there a way to have more information? > > OG.
Correctable SCSI errors show that the data in a sector was not properly read, but the device was able to fix the data error because of the redundancy in the CRC. The error could be permanently fixed is you rewrote the sector. You probably don't know where the bad sector is without adding a printk() to driver code. Some BIOS SCSI utilities (Adaptec) have the capability of reading an entire drive and fixing bad sectors either by rewrite or relocation. Since drives can be accessed as files, you could write a utility that opens the RAW device with in NOT mounted, reads a bunch of sectors, then writes them back. To do this, you need to verify that lseek() works on your particular drive because you need to write the data back to the same offset that you read it from. I mention this because the raw r/w of an early Adaptec (aha1542) driver, didn't impliment lseek, just returned 'okay'. You can imagine the mess I made of a drive with that controller! Once you verify that lseek works, the rest of the code is trivial. I suggest reading then writing 64 kilobytes at a time. It will seem to take 'forever', but the retries on these relatively short groups of sectors (128 sectors), will be short when errors are encountered. Make sure the drive is either not mounted or mounted r/o. Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.67 BogoMips). New book: http://www.AbominableFirebug.com/ _ **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/