Re: Disk corruption and performance issue.

David Christensen Sat, 20 Jan 2024 13:16:29 -0800

On 1/20/24 08:25, Tim Woodall wrote:
> Some time ago I wrote about a data corruption issue. I've still not
> managed to track it down ...

Please post a console session that demonstrates, or at least documents,the data corruption.

Please cut and paste complete console sessions into your posts --prompt, command entered, output displayed. Redact sensitive information.

It helps if your prompt contains useful information. I set PS1 in$HOME/.profile as follows:


2024-01-20 11:31:58 dpchrist@laalaa ~
$ grep PS1 .profile | grep -v '#'
export PS1='\n\D{%Y-%m-%d %H:%M:%S} ${USER}@\h \w\n\$ '


> On the server that has no issues:
> sda: Sector size (logical/physical): 512 bytes / 512 bytes
> sdb: Sector size (logical/physical): 512 bytes / 512 bytes

Attempting to diagnose issues without all the facts is an exercise infutility.

Please post console sessions that document the make and model of yourdisks, their partition tables, your md RAID configurations, and your LVMconfigurations.



> These are then gpt partitioned, a small BIOS boot and EFI partition and
> then a big "Linux filesystem" partition that is part of a mdadm raid
>
> md0 : active raid1 sda3[3] sdb3[2]
>
> On the server that has performance issues and I get occasional data
> corruption (both reading and writing) under heavy (disk) load:
>
> sda: Sector size (logical/physical): 512 bytes / 512 bytes
> sdb: Sector size (logical/physical): 512 bytes / 4096 bytes

Putting a sector size 512/512 disk and a sector size 512/4096 disk intothe same mirror is unconventional. I suppose there are kerneldevelopers who could definitively explain the consequences, but I am notone of them. The KISS solution is to use matching disks in RAID.



> All the
> partitions start on a 4k boundary but the big partition is not an exact
> multiple of 4k.

I align my partitions to 1 MiB boundaries and suggest that you do the same.


> ... the "heavy load" filesystem that triggered the issue ...

Please post a console session that demonstrates how data corruption isrelated to I/O throughput.



> There are a LOT of
> partitions and filesystems in a complicated layered LVM setup ...

Complexity is the enemy of data integrity and system reliability. Isuggest simplifying where it makes sense; but do not over-simplify.



> Booted on the problem machine but physical disk still on the OK machine:
> real    0m35.731s
> user    0m5.291s
> sys     0m4.677s
>
> Booted on the good machine but physical disk still on the problem
> machine:
> real    0m57.721s
> user    0m5.446s
> sys     0m4.783s

Please provide host names.

Please post a console session that demonstrates how data corruptionaffects VM boot time.



> The SMART attributes from the problem machine:
> sda:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>    5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail

> Always - 0> 12 Power_Cycle_Count 0x0032 099099 000 Old_age> Always - 54> 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100100 010 Pre-fail> Always - 0> 181 Program_Fail_Cnt_Total 0x0032 100100 010 Old_age

> Always       -       0
> 182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age
> Always       -       0
> 183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail
> Always       -       0
> 187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0032   067   049   000    Old_age
> Always       -       33
> 195 ECC_Error_Rate          0x001a   200   200   000    Old_age
> Always       -       0
> 199 CRC_Error_Count         0x003e   100   100   000    Old_age
> Always       -       0

Those look good.


>    9 Power_On_Hours          0x0032   096   096   000    Old_age

> Always - 18280> 177 Wear_Leveling_Count 0x0013 087087 000 Pre-fail> Always - 129> 241 Total_LBAs_Written 0x0032 099099 000 Old_age

> Always       -       62154466086

Please compare those to the SSD specifications.


> 235 POR_Recovery_Count      0x0012   099   099   000    Old_age
> Always       -       39

https://www.overclock.net/threads/what-does-por-recovery-count-mean-in-samsung-magician.1491466/


I see a similar statistic on my Intel SSD 520 Series drives:

 12 Power_Cycle_Count       -O--CK   099   099   000    -    1996
174 Unexpect_Power_Loss_Ct  -O--CK   100   100   000    -    1994

Linux does not seem to shut down the drives the way they want to be shutdown.



> sdb:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>    1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail
> Always       -       0
>    5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age

> Always - 0> 12 Power_Cycle_Count 0x0032 100100 000 Old_age

> Always       -       50
> 171 Program_Fail_Count      0x0032   100   100   000    Old_age
> Always       -       0
> 172 Erase_Fail_Count        0x0032   100   100   000    Old_age
> Always       -       0
> 183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age
> Always       -       0
> 184 Error_Correction_Count  0x0032   100   100   000    Old_age
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
> Always       -       0
> 194 Temperature_Celsius     0x0022   074   052   000    Old_age
> Always       -       26 (Min/Max 0/48)
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age
> Always       -       0
> 197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age
> Offline      -       0
> 206 Write_Error_Rate        0x000e   100   100   000    Old_age
> Always       -       0
> 210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age
> Always       -       0

Those look good.


> 199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age
> Always       -       1

I believe that indicates a SATA communications problem. I suggest usingSATA cables that are rated for SATA III 6 Gbps with locking connectors.If you are in doubt, buy new cables that are properly identified.



>    9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       18697
> 173 Ave_Block-Erase_Count   0x0032   067   067   000    Old_age
> Always       -       433
> 180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail
> Always       -       45
> 246 Total_LBAs_Written      0x0032   100   100   000    Old_age
> Always       -       63148678276
> 247 Host_Program_Page_Count 0x0032   100   100   000    Old_age
> Always       -       1879223820
> 248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age
> Always       -       1922002147

Please compare those to SSD specifications.


> 202 Percent_Lifetime_Remain 0x0030   067   067   001    Old_age
> Offline      -       33

That value is not encouraging, but it is an estimate; not a hard errorcount. I would monitor it over time.



> 174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age
> Always       -       12

Same comments as above.

An underlying theme is "data integrity". AIUI only btrfs and ZFS haveintegrity checking built-in; AIUI md, LVM, and ext[234] do not. Linuxdm-integrity has not reached Debian stable yet. I suggest that youimplemented periodic runs of BSD mtree(8) to monitor your file systemsfor corruption:


https://manpages.debian.org/bullseye/mtree-netbsd/mtree.8.en.html

Another underlying theme is system monitoring and failure prediction.It is good to run SMART tests and get SMART tests on a regular basis. Ido this manually, have too many disks, and am doing a lousy job. I needto learn smartd(8).

There have been a few posts recently by people who are running consumerSSD's in RAID 24x7. After 2+ years, the SSD's start having problems andproduce scary SMART reports. AIUI consumer drives are rated for 40hours/week. Running them 24x7 is like "dog years" -- multiply wallclock time by 24 * 7 / 40 to get equivalent usage time. In this case, 2years at 24x7 is equivalent to 8.4 years of 40 hours/week usage. If youwant to run disks 24x7 and have them last 5 years with a certain I/Oload, get disks rated for that.



David

Re: Disk corruption and performance issue.

Reply via email to