On Wed, Feb 03, 2010 at 11:22:06AM +0100, Cesare Leonardi wrote: > M. Dietrich wrote: > > my system had serious filesystem corruption with several -bigmem > > kernel in the past (from 2.6.28 to 2.6.32). > > Does this mean that with normal 686 or 486 kernel the corruption > doesn't happen?
yes. > > However many years ago i've experienced frequent filesystem > corruption but i couldn't figure out why. Eventually i discovered > was some hdparm settings... > Was a lot hard to find, so i hope this could help you. ;-) there are no special settings installed using hdparm: /dev/sda: multcount = 0 (off) IO_support = 1 (32-bit) readonly = 0 (off) readahead = 256 (on) geometry = 30401/255/63, sectors = 488397168, start = 0 > > for sure i can't guarantee that this isn't related to some hardware > > fault like broken ram or the like but i checked ram with memtest86+. > > If i were you, i would also install smartmontools and try something > like: smartctl -a /dev/yourdisk I'd put particular attention in the > "Vendor Specific SMART Attributes with Thresholds" table to find > something strange. it's already installed, this is the output: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 085 069 034 Pre-fail Always - 98867399 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 001 001 020 Old_age Always FAILING_NOW 248712 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 40211526 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 269350284038985 10 Spin_Retry_Count 0x0013 100 100 034 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 448 184 End-to-End_Error 0x0032 100 253 000 Old_age Always - 0 187 Reported_Uncorrect 0x003a 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x0022 100 100 045 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 071 052 000 Old_age Always - 29 (Lifetime Min/Max 10/48) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 19 192 Power-Off_Retract_Count 0x0022 062 062 000 Old_age Always - 77434 193 Load_Cycle_Count 0x001a 001 001 000 Old_age Always - 320283 194 Temperature_Celsius 0x0012 029 048 000 Old_age Always - 29 (0 10 0 0) 195 Hardware_ECC_Recovered 0x0010 070 061 000 Old_age Offline - 98881899 196 Reallocated_Event_Count 0x003e 096 096 000 Old_age Always - 3645 (28548, 0) 197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0032 100 100 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0000 100 253 000 Old_age Offline - 0 i wonder how to interpret that. Start_Stop_Count has FAILING_NOW, maybe because hdaps is stopping the device often? why is that bad? hm. but everything else looks fine, right? > And try to hear if the disk make suspicious noise. it doesnt - silent as a sleeping baby. > > If you have a minimum suspect for the ram, try to temporarly remove > some bank, if you have more than one, or replace completely if you > can. In the past i've seen at least two cases where memtest run ok > for about a day but the system had sporadic system freeze and BSOD > (Windows PCs). When i've replaced the ram the problems disapperead. > removing would reduce mem size and the need for bigmem kernel obsolete. replacing isn't possible right now. point is: i never had strange behaviour related to mem like kernel-freezes or program core dumps and i use the system quite alot with big (cross-)compiles and everything that uses mem alot... thing is that i discovered fs corruption by accident - git complained about a defect repo. then i forced a fsck run at boot and that failed. maybe all bigmem users should force a fsck and see if they already suffer from a similar corruption. if not this bug should be closed because it seems to be hw related. but i don't know how & where to search, especially because this computer is a tool to do my work on. best regards, michael -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org