Lots of good info! Thanks. I have installed sg3_utils, cool stuff. I knew about AWRE and ARRE. AWRE is on, ARRE is off. I do plan to turn on ARRE for all of my disks. I can't re-produce these errors, so I guess they were write errors that were re-located. I was hoping to find a reproducible error, then turn ARRE on and "see" the error get corrected. You had the same idea using 'sginfo -G /dev/sdl' to verify the error was corrected.
I do a read test of all my disks, every night. It is required, IMO, since md kicks disks out for having 1 bad block. I want to find the bad blocks before md finds them. But since I started my nightly disk tests, I have had no bad blocks. It seems ARRE is on, but it is not. Anyway, thanks for the good info. Guy -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Gilbert Sent: Tuesday, January 18, 2005 8:23 PM To: Guy Cc: 'Matthias Andree'; 'SCSI Mailing List' Subject: Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error Guy wrote: > Good info. Thanks! > I could not find the answer with google. Too much noise! > > Is 0x25e6e3 the block number? Yes (logical block number expressed in hex) > If it is, is it relative to the beginning of sdl1, or sdl? /dev/sdl > If not, what is it? Looking at the settings of the "read write error recovery" mode page on /dev/sdl may be instructive. ['sginfo -e /dev/sdl' from sg3_utils.] The PER bit seems to be set (otherwise a recovered error should not have been reported) but the ARRE and AWRE bits are probably clear. Those bits control the automatic reaasignment of a block when a recovered error occurs as reported in your case. Assuming the problem occurred on a read and that the ARRE it is clear then you may want to reassign that block. To check its current state you might try: sg_dd if=/dev/sdl skip=0x25e6e3 of=. bs=512 count=1 blk_sgio=1 If that recovered error persists (or worse) rather than formatting the disk, reassigning that block is more surgical. sg_reassign has be added to sg3_utils recently (v1.12 beta at www.torque.net/sg) to do this. In your case: sg_reassign -a 0x25e6e3 /dev/sdl If successful the replaced sector should go into the "grown" defect list ('sginfo -G /dev/sdl'). This utility may be worth trying before and after the sg_reassign. Another way to accomplish the same thing is to set the ARRE bit (and the AWRE while you are at it) and do another read of that block. The reported additonal sense message should change to something like "Recovered data: data auto-reallocated". Reading the whole disk might be wise (to see if that lba was a lone case). More generally this is not a good sign concerning the health of that disk. No data has been lost _yet_ but it had to work hard to recovery it. Any entries in the "grown" defect list is not a good sign. Also with smartmontools you might like to try 'smartctl -a /dev/sdl' and examine the "Error counter log" and compare that does some of your other drives that are not reporting problems. A long self test may also be appropriate: 'smartctl -t long /dev/sdl'. Doug Gilbert > -----Original Message----- > From: Matthias Andree [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 18, 2005 4:09 PM > To: Guy > Cc: unlisted-recipients:; no To-header on input; 'SCSI Mailing List' > Subject: Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key > Recovered Error > > "Guy" <[EMAIL PROTECTED]> writes: > > >>Can anyone help decode this info? >> >>What is 0x25e6e3? >>What disk is sd08:b1? > > > /dev/sdl1 (ess dee ell one) - that's sedecimal notation for a device > with major 8 minor 0xb1 = 177; > > $ ls -l /dev/sd* |grep " 8, 177" > brw-rw---- 1 root disk 8, 177 2004-10-02 10:38 /dev/sdl1 > > >>kernel: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error >>kernel: Additional sense indicates Recovered data with error corr. & > > retries > >>applied > > > Time to check and possibly replace the drive, or at least refresh the > block. > > smartmontools (on sourceforge) and perhaps badblocks or Jörg Schillings > sformat (careful!) may help you with that. > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html