On Thu, 2006-03-30 at 13:38 +0200, Ramiro Aceves wrote: > Hello Debian friends, > > On september 2005 I bought a new Seagate 160 GB hard disk type > ST3160021A UDMA (not SATA) and after some time of good working I am > getting some kind of errors, mainly on Debian Sarge startup. > > Sometimes my system do not boot because it says something like: " > readonly filesystem". > > The errors occur frequently now, and they often happen on the system > "cold" booting, I mean, the first time I switch it on. > > I cannot tell you the exact messages cause I am not the normal user > of this computer. My mother, who uses the computer, has written down > the following message, so It could be it is not accurate: > > "ext3 error device hda1 in start transation: readonly filesystem." > > I also have some /var/log/messages errors: > > > Mar 26 10:49:23 debian-remix kernel: hda: dma_intr: status=0x51 { > DriveReady SeekComplete Error } > Mar 26 10:49:23 debian-remix kernel: hda: dma_intr: error=0x40 { > UncorrectableError }, LBAsect=43778543, high=2, low=10224111, > sector=43778543 > Mar 26 10:49:23 debian-remix kernel: end_request: I/O error, dev hda, > sector 43778543 > > > I have also have run SMARTCTL tests with the following results: > > > # smartctl -a /dev/hda > > >From wich I have captured the last 5 errors:
<smart stuff> > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 058 056 006 Pre-fail > Always - 129227943 > 3 Spin_Up_Time 0x0003 097 096 000 Pre-fail > Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age > Always - 1 > 5 Reallocated_Sector_Ct 0x0033 098 098 036 Pre-fail > Always - 80 > 7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail > Always - 22255207 > 9 Power_On_Hours 0x0032 100 100 000 Old_age > Always - 795 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age > Always - 559 > 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always > - 33 > 195 Hardware_ECC_Recovered 0x001a 058 056 000 Old_age Always > - 129227943 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always > - 0 > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age > Offline - 0 > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always > - 0 <smart error stuff> > > > What do you thing shoud I do? > > 1-¿Does it make sense to check the disk cable? Or is it an "internal" > disk drive error? > 2- Should I return the disk to my seller? > > > Normally, restarting the computer solves the problem after a fsck. > Sometimes I have also run a "manual" fsck with no aparent data loss. I > am concerned about a more serious hard disk failure with real data > loss. (I have done backups, no problem ;-) ) > > Many thanks in advance: > > Ramiro > Those errors are bad indeed!. I've seen those kernel messages on one of my machines due to a faulty cable (few years ago). I've faced some hard drive issues last week (still facing actually :)) and started looking into smart. I've found this article which explains it quite good (http://www.linuxjournal.com/article/6983 ). According to the article, high values in the attribute-table are good. You have some pretty low values, even below the treshold, which is not good. You can also boot knoppix or some live distro and run badblocks on the drive. This will scan the entire drive for badblocks. Maybe Seagate provides a tool (on the site for instance) to examine the drive. That could give some specific information. I don't know if you can return a drive by saying that it is dying. I think they will send you home with the message: "come back when it's dead". For information, my attribute table from my Maxtor 6Y120M0 (SATA): ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 138 128 063 Pre-fail Always - 24509 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 455 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 252 251 187 Pre-fail Always - 64510 9 Power_On_Minutes 0x0032 218 218 000 Old_age Always - 103h+20m 10 Spin_Retry_Count 0x002b 213 205 157 Pre-fail Always - 21 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 308 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 30 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 1435 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0008 164 010 000 Old_age Offline - 190 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 2 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0 207 Spin_High_Current 0x002a 213 205 000 Old_age Always - 21 208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 193 192 000 Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 Good luck Hope I helped (a little) Philippe De Ryck -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]