Thanks, so are there two drives that are bad? Sorry, I am confused. It is likely no longer in warranty: the one with /home is new (I think) and also the /mnt/backup (which is a rsync-based backup I do so as to actually be able to see these files, and also as a more reliable backup that i can actually see). Outside this, I have a / drive that is a smaller SSD. I also used to have that raided, but that other / drive died and I never got to replacing it.
So, my question is that is it only the raid drive /dev/sda that is bad, or is there something else that you can see based on the report? Many thanks, and best wishes, Ranjan On Fri Aug18'23 02:58:30PM, Roger Heflin wrote: > From: Roger Heflin <rogerhef...@gmail.com> > Date: Fri, 18 Aug 2023 14:58:30 -0500 > To: Community support for Fedora users <users@lists.fedoraproject.org> > Reply-To: Community support for Fedora users <users@lists.fedoraproject.org> > Subject: Re: slowness with kernel 6.4.10 and software raid > > ok. You have around 4000 sectors that are bad and are reallocated. > > You have around 1000 that are offline uncorrectable (reads failed). > > And you have a desktop drive that has a bad sector timeout of who > knows exactly what. I would guess at least 30 seconds, it could be > higher, but it must be lower than the scsi timeout fo the device. > > Given the power on hours the disk is out of warranty (I think). If > the disk was in warranty you could get the disk vendor to replace it. > > So whatever that timeout is when you hit a single bad sector the disk > is going to keep re-reading it for that timeout and then report that > sector cannot be read and mdraid will then read it from the other > mirror and re-write it. > > This disk could eventually failed to read each sector and mdraid could > re-write them and that may fix it. Or it could fix some of them on > this pass, and some on the next pass, and never fix all of them so sda > simply sucks. > > Best idea would be to buy a new disk, but this time do not buy a > desktop drive nor buy a SMR drive. There is a webpage someplaec > that lists which disks are not SMR disks, and other webpages list what > disks have a settable timeout (WD Red Plus and/or Seagate Ironwolf, > and likely others). > > Likely the disks will be classified as enterprise and/or NAS disks, > but whatever you look at make sure to check the vendors list to see if > the disk is SMR or not. Note WD Red is SMR, WD Red Plus is not SMR. > And SMR sometimes does not play nice with raid. > > On Fri, Aug 18, 2023 at 2:05 PM Ranjan Maitra <mlmai...@gmx.com> wrote: > > > > On Fri Aug18'23 01:39:08PM, Roger Heflin wrote: > > > From: Roger Heflin <rogerhef...@gmail.com> > > > Date: Fri, 18 Aug 2023 13:39:08 -0500 > > > To: Community support for Fedora users <users@lists.fedoraproject.org> > > > Reply-To: Community support for Fedora users > > > <users@lists.fedoraproject.org> > > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > > > The above makes it very clear what is happening. What kind of disks > > > are these? And did you set the scterc timeout? You can see it via > > > smartctl -l scterc /dev/sda and then repeat on the other disk. > > > > > > Setting the timeout as low as you can will improve this situation > > > some, but it appears that sda has a number of bad sectors on it. > > > > > > a full output of "smartclt --xall /dev/sda" would be useful also to > > > see how bad it is. > > > > > > Short answer is you probably need a new device for sda. > > > > > > > Thanks! > > > > I tried: > > > > # smartctl -l scterc /dev/sda > > smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local > > build) > > Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org > > > > SCT Error Recovery Control command not supported > > > > # smartctl --xall /dev/sda > > > > smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] > > (local build) > > Copyright (C) 2002-23, Bruce Allen, Christian Franke, > > www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Seagate Barracuda 7200.14 (AF) > > Device Model: ST2000DM001-1ER164 > > Serial Number: Z4Z5F3LE > > LU WWN Device Id: 5 000c50 091167f04 > > Firmware Version: CC27 > > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 7200 rpm > > Form Factor: 3.5 inches > > Device is: In smartctl database 7.3/5528 > > ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b > > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Fri Aug 18 14:01:28 2023 CDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > AAM feature is: Unavailable > > APM level is: 128 (minimum power consumption without standby) > > Rd look-ahead is: Enabled > > Write cache is: Enabled > > DSN feature is: Unavailable > > ATA Security is: Disabled, NOT FROZEN [SEC1] > > Wt Cache Reorder: Unavailable > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > General SMART Values: > > Offline data collection status: (0x00) Offline data collection > > activity > > was never started. > > Auto Offline Data Collection: > > Disabled. > > Self-test execution status: ( 0) The previous self-test > > routine completed > > without error or no self-test has > > ever > > been run. > > Total time to complete Offline > > data collection: ( 80) seconds. > > Offline data collection > > capabilities: (0x73) SMART execute Offline > > immediate. > > Auto Offline data collection on/off > > support. > > Suspend Offline collection upon new > > command. > > No Offline surface scan supported. > > Self-test supported. > > Conveyance Self-test supported. > > Selective Self-test supported. > > SMART capabilities: (0x0003) Saves SMART data before > > entering > > power-saving mode. > > Supports SMART auto save timer. > > Error logging capability: (0x01) Error logging supported. > > General Purpose Logging supported. > > Short self-test routine > > recommended polling time: ( 1) minutes. > > Extended self-test routine > > recommended polling time: ( 212) minutes. > > Conveyance self-test routine > > recommended polling time: ( 2) minutes. > > SCT capabilities: (0x1085) SCT Status supported. > > > > SMART Attributes Data Structure revision number: 10 > > Vendor Specific SMART Attributes with Thresholds: > > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > > 1 Raw_Read_Error_Rate POSR-- 116 092 006 - 106200704 > > 3 Spin_Up_Time PO---- 096 096 000 - 0 > > 4 Start_Stop_Count -O--CK 100 100 020 - 97 > > 5 Reallocated_Sector_Ct PO--CK 097 097 010 - 3960 > > 7 Seek_Error_Rate POSR-- 084 060 030 - 333268033 > > 9 Power_On_Hours -O--CK 062 062 000 - 34085 > > 10 Spin_Retry_Count PO--C- 100 100 097 - 0 > > 12 Power_Cycle_Count -O--CK 100 100 020 - 96 > > 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 > > 184 End-to-End_Error -O--CK 100 100 099 - 0 > > 187 Reported_Uncorrect -O--CK 001 001 000 - 384 > > 188 Command_Timeout -O--CK 100 098 000 - 3 71 72 > > 189 High_Fly_Writes -O-RCK 065 065 000 - 35 > > 190 Airflow_Temperature_Cel -O---K 063 055 045 - 37 (Min/Max > > 37/42) > > 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 > > 192 Power-Off_Retract_Count -O--CK 100 100 000 - 19 > > 193 Load_Cycle_Count -O--CK 001 001 000 - 294513 > > 194 Temperature_Celsius -O---K 037 045 000 - 37 (0 18 0 0 > > 0) > > 197 Current_Pending_Sector -O--C- 094 080 000 - 1064 > > 198 Offline_Uncorrectable ----C- 094 080 000 - 1064 > > 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 > > 240 Head_Flying_Hours ------ 100 253 000 - > > 31366h+32m+19.252s > > 241 Total_LBAs_Written ------ 100 253 000 - 22394883074 > > 242 Total_LBAs_Read ------ 100 253 000 - 258335971674 > > ||||||_ K auto-keep > > |||||__ C event count > > ||||___ R error rate > > |||____ S speed/performance > > ||_____ O updated online > > |______ P prefailure warning > > > > General Purpose Log Directory Version 1 > > SMART Log Directory Version 1 [multi-sector log support] > > Address Access R/W Size Description > > 0x00 GPL,SL R/O 1 Log Directory > > 0x01 SL R/O 1 Summary SMART error log > > 0x02 SL R/O 5 Comprehensive SMART error log > > 0x03 GPL R/O 5 Ext. Comprehensive SMART error log > > 0x06 SL R/O 1 SMART self-test log > > 0x07 GPL R/O 1 Extended self-test log > > 0x09 SL R/W 1 Selective self-test log > > 0x10 GPL R/O 1 NCQ Command Error log > > 0x11 GPL R/O 1 SATA Phy Event Counters log > > 0x21 GPL R/O 1 Write stream error log > > 0x22 GPL R/O 1 Read stream error log > > 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log > > 0x80-0x9f GPL,SL R/W 16 Host vendor specific log > > 0xa1 GPL,SL VS 20 Device vendor specific log > > 0xa2 GPL VS 4496 Device vendor specific log > > 0xa8 GPL,SL VS 129 Device vendor specific log > > 0xa9 GPL,SL VS 1 Device vendor specific log > > 0xab GPL VS 1 Device vendor specific log > > 0xb0 GPL VS 5176 Device vendor specific log > > 0xbe-0xbf GPL VS 65535 Device vendor specific log > > 0xc0 GPL,SL VS 1 Device vendor specific log > > 0xc1 GPL,SL VS 10 Device vendor specific log > > 0xc3 GPL,SL VS 8 Device vendor specific log > > 0xe0 GPL,SL R/W 1 SCT Command/Status > > 0xe1 GPL,SL R/W 1 SCT Data Transfer > > > > SMART Extended Comprehensive Error Log Version: 1 (5 sectors) > > Device Error Count: 384 (device log contains only the most recent 20 > > errors) > > CR = Command Register > > FEATR = Features Register > > COUNT = Count (was: Sector Count) Register > > LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 > > LH = LBA High (was: Cylinder High) Register ] LBA > > LM = LBA Mid (was: Cylinder Low) Register ] Register > > LL = LBA Low (was: Sector Number) Register ] > > DV = Device (was: Device/Head) Register > > DC = Device Control Register > > ER = Error register > > ST = Status register > > Powered_Up_Time is measured from power on, and printed as > > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > > > Error 384 [3] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b9 20 00 00 Error: UNC at LBA = 0xa312b920 > > = 2735913248 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b9 20 40 00 16d+06:35:59.162 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b9 18 40 00 16d+06:35:59.154 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b9 10 40 00 16d+06:35:59.154 READ FPDMA > > QUEUED > > 61 00 00 00 08 00 00 a3 12 b9 10 40 00 16d+06:35:59.154 WRITE FPDMA > > QUEUED > > ef 00 10 00 02 00 00 00 00 00 00 a0 00 16d+06:35:59.154 SET FEATURES > > [Enable SATA feature] > > > > Error 383 [2] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b9 10 00 00 Error: UNC at LBA = 0xa312b910 > > = 2735913232 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b9 10 40 00 16d+06:35:53.336 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b9 08 40 00 16d+06:35:53.335 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b9 00 40 00 16d+06:35:53.335 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 f8 40 00 16d+06:35:53.335 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 f0 40 00 16d+06:35:53.331 READ FPDMA > > QUEUED > > > > Error 382 [1] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b8 e8 00 00 Error: UNC at LBA = 0xa312b8e8 > > = 2735913192 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b8 e8 40 00 16d+06:35:49.468 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 e0 40 00 16d+06:35:49.460 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 d8 40 00 16d+06:35:49.460 READ FPDMA > > QUEUED > > 61 00 00 00 08 00 00 a3 12 b8 d8 40 00 16d+06:35:49.460 WRITE FPDMA > > QUEUED > > ef 00 10 00 02 00 00 00 00 00 00 a0 00 16d+06:35:49.459 SET FEATURES > > [Enable SATA feature] > > > > Error 381 [0] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b8 d8 00 00 Error: UNC at LBA = 0xa312b8d8 > > = 2735913176 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b8 d8 40 00 16d+06:35:45.676 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 d0 40 00 16d+06:35:45.673 READ FPDMA > > QUEUED > > ef 00 10 00 02 00 00 00 00 00 00 a0 00 16d+06:35:45.673 SET FEATURES > > [Enable SATA feature] > > 27 00 00 00 00 00 00 00 00 00 00 e0 00 16d+06:35:45.673 READ NATIVE > > MAX ADDRESS EXT [OBS-ACS-3] > > ec 00 00 00 00 00 00 00 00 00 00 a0 00 16d+06:35:45.672 IDENTIFY DEVICE > > > > Error 380 [19] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b8 c8 00 00 Error: UNC at LBA = 0xa312b8c8 > > = 2735913160 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b8 c8 40 00 16d+06:35:39.283 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 c0 40 00 16d+06:35:39.282 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 b8 40 00 16d+06:35:39.282 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 b0 40 00 16d+06:35:39.270 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 a8 40 00 16d+06:35:39.270 READ FPDMA > > QUEUED > > > > Error 379 [18] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b8 a8 00 00 Error: UNC at LBA = 0xa312b8a8 > > = 2735913128 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b8 a8 40 00 16d+06:35:35.558 READ FPDMA > > QUEUED > > 61 00 00 05 78 00 00 65 ac 20 00 40 00 16d+06:35:35.557 WRITE FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 a0 40 00 16d+06:35:35.540 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 98 40 00 16d+06:35:35.532 READ FPDMA > > QUEUED > > ef 00 10 00 02 00 00 00 00 00 00 a0 00 16d+06:35:35.532 SET FEATURES > > [Enable SATA feature] > > > > Error 378 [17] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b8 90 00 00 Error: UNC at LBA = 0xa312b890 > > = 2735913104 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b8 90 40 00 16d+06:35:31.406 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 88 40 00 16d+06:35:31.406 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 80 40 00 16d+06:35:31.405 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 78 40 00 16d+06:35:31.398 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 70 40 00 16d+06:35:31.397 READ FPDMA > > QUEUED > > > > Error 377 [16] occurred at disk power-on lifetime: 34042 hours (1418 days > > + 10 hours) > > When the command that caused the error occurred, the device was active > > or idle. > > > > After command completion occurred, registers were: > > ER -- ST COUNT LBA_48 LH LM LL DV DC > > -- -- -- == -- == == == -- -- -- -- -- > > 40 -- 53 00 00 00 00 a3 12 b8 70 00 00 Error: UNC at LBA = 0xa312b870 > > = 2735913072 > > > > Commands leading to the command that caused the error were: > > CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > > Command/Feature_Name > > -- == -- == -- == == == -- -- -- -- -- --------------- > > -------------------- > > 60 00 00 00 08 00 00 a3 12 b8 70 40 00 16d+06:35:27.414 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 68 40 00 16d+06:35:27.413 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 60 40 00 16d+06:35:27.402 READ FPDMA > > QUEUED > > 60 00 00 00 08 00 00 a3 12 b8 58 40 00 16d+06:35:27.401 READ FPDMA > > QUEUED > > 61 00 00 00 08 00 00 a3 12 b8 58 40 00 16d+06:35:27.401 WRITE FPDMA > > QUEUED > > > > SMART Extended Self-test Log Version: 1 (1 sectors) > > Num Test_Description Status Remaining > > LifeTime(hours) LBA_of_first_error > > # 1 Short offline Completed: read failure 90% 29204 > > 771754056 > > # 2 Short offline Completed without error 00% 19 > > - > > # 3 Short offline Completed without error 00% 0 > > - > > > > SMART Selective self-test log data structure revision number 1 > > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > > 1 0 0 Not_testing > > 2 0 0 Not_testing > > 3 0 0 Not_testing > > 4 0 0 Not_testing > > 5 0 0 Not_testing > > Selective self-test flags (0x0): > > After scanning selected spans, do NOT read-scan remainder of disk. > > If Selective self-test is pending on power-up, resume after 0 minute > > delay. > > > > SCT Status Version: 3 > > SCT Version (vendor specific): 522 (0x020a) > > Device State: Active (0) > > Current Temperature: 37 Celsius > > Power Cycle Min/Max Temperature: 37/41 Celsius > > Lifetime Min/Max Temperature: 18/45 Celsius > > Under/Over Temperature Limit Count: 0/0 > > > > SCT Data Table command not supported > > > > SCT Error Recovery Control command not supported > > > > Device Statistics (GP/SMART Log 0x04) not supported > > > > Pending Defects log (GP Log 0x0c) not supported > > > > SATA Phy Event Counters (GP Log 0x11) > > ID Size Value Description > > 0x000a 2 102 Device-to-host register FISes sent due to a > > COMRESET > > 0x0001 2 0 Command failed due to ICRC error > > 0x0003 2 0 R_ERR response for device-to-host data FIS > > 0x0004 2 0 R_ERR response for host-to-device data FIS > > 0x0006 2 0 R_ERR response for device-to-host non-data FIS > > 0x0007 2 0 R_ERR response for host-to-device non-data FIS > > > > Many thanks, > > Ranjan > > > > > > > On Fri, Aug 18, 2023 at 1:30 PM Ranjan Maitra <mlmai...@gmx.com> wrote: > > > > > > > > Thanks, Roger! > > > > > > > > > > > > On Fri Aug18'23 12:23:23PM, Roger Heflin wrote: > > > > > From: Roger Heflin <rogerhef...@gmail.com> > > > > > Date: Fri, 18 Aug 2023 12:23:23 -0500 > > > > > To: Community support for Fedora users <users@lists.fedoraproject.org> > > > > > Reply-To: Community support for Fedora users > > > > > <users@lists.fedoraproject.org> > > > > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > > > > > > > Is it moving at all or just stopped? If just stopped it appears that > > > > > md126 is using external:/md127 for something and md127 looks wrong > > > > > (both disks are spare) but I don't know in this external case what > > > > > md127 should look like. > > > > > > > > It is moving, slowly. It is a 2 TB drive, but this is weird. > > > > > > > > > > > > > > I would suggest checking messages with grep md12[67] /var/log/messages > > > > > (and older messages files if the reboot was not this week) to see what > > > > > is going on. > > > > > > > > Good idea! Here is the result from > > > > > > > > $ grep md126 /var/log/messages > > > > > > > > > > > > Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md > > > > device /dev/md126 > > > > Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out > > > > of 2 mirrors > > > > Aug 16 14:21:20 localhost kernel: md126: detected capacity change > > > > from 0 to 3711741952 > > > > Aug 16 14:21:20 localhost kernel: md126: p1 > > > > Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in > > > > dev-md126p1.device - /dev/md126p1 being skipped. > > > > Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, > > > > 7345384/115998720 files, 409971205/463967488 blocks > > > > Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted > > > > filesystem 932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data > > > > mode. Quota mode: none. > > > > Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md > > > > device /dev/md126 > > > > Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected > > > > on md device /dev/md126 > > > > Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126 > > > > Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735900352 > > > > Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735900864 > > > > Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900496 on sda) > > > > Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900568 on sda) > > > > Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900576 on sda) > > > > Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900624 on sda) > > > > Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900640 on sda) > > > > Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900688 on sda) > > > > Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900712 on sda) > > > > Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900792 on sda) > > > > Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735900352 to other mirror: sdc > > > > Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900872 on sda) > > > > Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900920 on sda) > > > > Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735900992 on sda) > > > > Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735900864 to other mirror: sdc > > > > Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735905704 > > > > Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735905960 > > > > Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735905768 on sda) > > > > Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735905704 to other mirror: sdc > > > > Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735906120 on sda) > > > > Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735906192 on sda) > > > > Aug 16 19:35:39 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735906448 on sda) > > > > Aug 16 19:35:40 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735905960 to other mirror: sdc > > > > Aug 16 19:35:45 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735906472 > > > > Aug 16 19:35:49 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735906504 on sda) > > > > Aug 16 19:35:52 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735906472 to other mirror: sdc > > > > Aug 16 19:36:03 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735908008 > > > > Aug 16 19:36:08 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735908232 on sda) > > > > Aug 16 19:36:16 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735908344 on sda) > > > > Aug 16 19:36:21 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735908424 on sda) > > > > Aug 16 19:36:21 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735908008 to other mirror: sda > > > > Aug 16 19:36:30 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735908008 > > > > Aug 16 19:36:37 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735908296 on sda) > > > > Aug 16 19:36:38 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735908008 to other mirror: sdc > > > > Aug 16 19:36:42 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735908776 > > > > Aug 16 19:36:42 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735909032 > > > > Aug 16 19:36:46 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735908784 on sda) > > > > Aug 16 19:36:50 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735908944 on sda) > > > > Aug 16 19:36:50 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735908776 to other mirror: sdc > > > > Aug 16 19:36:55 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735909312 on sda) > > > > Aug 16 19:37:00 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735909360 on sda) > > > > Aug 16 19:37:04 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735909400 on sda) > > > > Aug 16 19:37:11 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735909520 on sda) > > > > Aug 16 19:37:11 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735909032 to other mirror: sdc > > > > Aug 16 19:37:21 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735910056 > > > > Aug 16 19:37:21 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735910568 > > > > Aug 16 19:37:25 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735910064 on sda) > > > > Aug 16 19:37:31 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735910080 on sda) > > > > Aug 16 19:38:00 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735910128 on sda) > > > > Aug 16 19:38:08 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735910240 on sda) > > > > Aug 16 19:38:12 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735910056 to other mirror: sdc > > > > Aug 16 19:38:15 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735910568 to other mirror: sdc > > > > Aug 16 19:38:23 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735911080 > > > > Aug 16 19:38:23 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735911592 > > > > Aug 16 19:38:27 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735911520 on sda) > > > > Aug 16 19:38:27 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735911080 to other mirror: sdc > > > > Aug 16 19:38:28 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735911592 to other mirror: sdc > > > > Aug 16 19:38:33 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735912104 > > > > Aug 16 19:38:37 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735912184 on sda) > > > > Aug 16 19:38:45 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735912240 on sda) > > > > Aug 16 19:38:49 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735912248 on sda) > > > > Aug 16 19:38:59 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735912288 on sda) > > > > Aug 16 19:39:05 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735912104 to other mirror: sdc > > > > Aug 16 19:39:10 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735912872 > > > > Aug 16 19:39:14 localhost kernel: md/raid1:md126: sda: rescheduling > > > > sector 2735913128 > > > > Aug 16 19:39:25 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735912976 on sda) > > > > Aug 16 19:39:33 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735913048 on sda) > > > > Aug 16 19:39:37 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735913072 on sda) > > > > Aug 16 19:39:41 localhost kernel: md/raid1:md126: redirecting sector > > > > 2735912872 to other mirror: sdc > > > > Aug 16 19:39:45 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735913128 on sda) > > > > Aug 16 19:39:55 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735913176 on sda) > > > > Aug 16 19:40:05 localhost kernel: md/raid1:md126: read error > > > > corrected (8 sectors at 2735913232 on sda) > > > > > > > > > > > > And here is what I get from: > > > > > > > > $ grep md127 /var/log/messages > > > > > > > > > > > > Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: > > > > Deactivated successfully. > > > > Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Unit > > > > process 884 (mdmon) remains running after unit stopped. > > > > Aug 16 14:16:38 localhost systemd[1]: Stopped mdmon@md127.service - > > > > MD Metadata Monitor on /dev/md127. > > > > Aug 16 14:16:38 localhost audit[1]: SERVICE_STOP pid=1 uid=0 > > > > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 > > > > msg='unit=mdmon@md127 comm="systemd" exe="/usr/lib/systemd/systemd" > > > > hostname=? addr=? terminal=? res=success' > > > > Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Consumed > > > > 41.719s CPU time. > > > > Aug 16 14:21:20 localhost systemd[1]: Starting mdmon@md127.service - > > > > MD Metadata Monitor on /dev/md127... > > > > Aug 16 14:21:20 localhost systemd[1]: Started mdmon@md127.service - > > > > MD Metadata Monitor on /dev/md127. > > > > Aug 16 14:21:20 localhost audit[1]: SERVICE_START pid=1 uid=0 > > > > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 > > > > msg='unit=mdmon@md127 comm="systemd" exe="/usr/lib/systemd/systemd" > > > > hostname=? addr=? terminal=? res=success' > > > > > > > > > > > > > > Maybe also if you have a prior good reboot in messages file include > > > > > that and see what happened differently between the 2. > > > > > > > > Yeah, I do not know where to find this. I looked into > > > > /var/log/messages, but it looks like it starts on August 13, which was > > > > a surprise to me, and the last non-responsive instance for me was last > > > > week (August 10, I think, when I booted into the 6.4 kernel). I did > > > > reboot in frustration on August 16. > > > > > > > > Thanks, > > > > Ranjan > > > > > > > > > > > > > > > > > > On Fri, Aug 18, 2023 at 7:46 AM Ranjan Maitra <mlmai...@gmx.com> > > > > > wrote: > > > > > > > > > > > > On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote: > > > > > > > From: Samuel Sieb <sam...@sieb.net> > > > > > > > Date: Thu, 17 Aug 2023 22:37:29 -0700 > > > > > > > To: users@lists.fedoraproject.org > > > > > > > Reply-To: Community support for Fedora users > > > > > > > <users@lists.fedoraproject.org> > > > > > > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > > > > > > > > > > > On 8/17/23 21:38, Ranjan Maitra wrote: > > > > > > > > $ cat /proc/mdstat > > > > > > > > Personalities : [raid1] > > > > > > > > md126 : active raid1 sda[1] sdc[0] > > > > > > > > 1855870976 blocks super external:/md127/0 [2/2] [UU] > > > > > > > > [=>...................] check = 8.8% > > > > > > > > (165001216/1855870976) finish=45465.2min speed=619K/sec > > > > > > > > > > > > > > > > md127 : inactive sda[1](S) sdc[0](S) > > > > > > > > 10402 blocks super external:imsm > > > > > > > > > > > > > > > > unused devices: <none> > > > > > > > > > > > > > > > > I am not sure what it is doing, and I am a bit concerned that > > > > > > > > this will go on at this rate for about 20 days. No knowing what > > > > > > > > will happen after that, and also if this problem will recur > > > > > > > > with another reboot. > > > > > > > > > > > > > > After a certain amount of time, mdraid will do a verification of > > > > > > > the data > > > > > > > where it scans the entire array. If you reboot, it will continue > > > > > > > from where > > > > > > > it left off. But that is *really* slow, so you should find out > > > > > > > what's going > > > > > > > on there. > > > > > > > > > > > > Yes, I know, just not sure what to do. Thanks very much! > > > > > > > > > > > > Any suggestion is appreciated! > > > > > > > > > > > > Best wishes, > > > > > > Ranjan > > > > > > _______________________________________________ > > > > > > users mailing list -- users@lists.fedoraproject.org > > > > > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > > > > > Fedora Code of Conduct: > > > > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > > > > > List Guidelines: > > > > > > https://fedoraproject.org/wiki/Mailing_list_guidelines > > > > > > List Archives: > > > > > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > > > > > Do not reply to spam, report it: > > > > > > https://pagure.io/fedora-infrastructure/new_issue > > > > > _______________________________________________ > > > > > users mailing list -- users@lists.fedoraproject.org > > > > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > > > > Fedora Code of Conduct: > > > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > > > > List Guidelines: > > > > > https://fedoraproject.org/wiki/Mailing_list_guidelines > > > > > List Archives: > > > > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > > > > Do not reply to spam, report it: > > > > > https://pagure.io/fedora-infrastructure/new_issue > > > > _______________________________________________ > > > > users mailing list -- users@lists.fedoraproject.org > > > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > > > Fedora Code of Conduct: > > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > > > List Archives: > > > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > > > Do not reply to spam, report it: > > > > https://pagure.io/fedora-infrastructure/new_issue > > > _______________________________________________ > > > users mailing list -- users@lists.fedoraproject.org > > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > > Fedora Code of Conduct: > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > > List Archives: > > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > > Do not reply to spam, report it: > > > https://pagure.io/fedora-infrastructure/new_issue > > _______________________________________________ > > users mailing list -- users@lists.fedoraproject.org > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > Fedora Code of Conduct: > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > Do not reply to spam, report it: > > https://pagure.io/fedora-infrastructure/new_issue > _______________________________________________ > users mailing list -- users@lists.fedoraproject.org > To unsubscribe send an email to users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue