Short story:
        $ sudo debugfs -R "ncheck 23666852" /dev/md127
fails with
        /dev/md127: Block bitmap checksum does not match bitmap while reading 
allocation bitmaps
        ncheck: Filesystem not open
even after a (clean) fsck. 23666852 is a known good inode.
        $ ls -li /data1/tmp/zzz
        23666852 -rw-rw-r-- 1 eyal eyal 544 Aug 12  2020 /data1/tmp/zzz
Same issue with other inodes.

Full story:

Recently one of my raid-6 disks started logging errors in smart report:

197 Current_Pending_Sector  -O--C-   100   100   000    -    8
198 Offline_Uncorrectable   ----C-   100   100   000    -    8

Then a few hours later:

197 Current_Pending_Sector  -O--C-   100   100   000    -    16
198 Offline_Uncorrectable   ----C-   100   100   000    -    16

By now the report also included:

Pending Defects log (GP Log 0x0c)
Index                LBA    Hours
    0        23240269256    53439
    1        23240269257    53439
    2        23240269258    53439
    3        23240269259    53439
    4        23240269260    53439
    5        23240269261    53439
    6        23240269262    53439
    7        23240269263    53439
    8        23387031568    53376
    9        23387031569    53376
   10        23387031570    53376
   11        23387031571    53376
   12        23387031572    53376
   13        23387031573    53376
   14        23387031574    53376
   15        23387031575    53376

This disk was in the array for over 6 years so not a big surprise.

As I was trying to identify the files (if any) using the above LBAs I used 
debugfs which gave the error above.

$ df /data1
Filesystem       1K-blocks        Used  Available Use% Mounted on
/dev/md127     58574076816 48925332280 9648728152  84% /data1

A search suggested I fsck the disk which I did. No issues logged. debugfs 
roblem remained.

I then thought that maybe the raid would have something to say, so I ran
        $ sudo raid6check /dev/md127 $((22695000)) 1024
followed by
        $ sudo raid6check /dev/md127 $((22838000)) 1024
which I figured covered the reported LBAs.

Surprisingly it found no errors but the smart pending errors disappeared. 
raid6check was run in check (no write) mode.

I then tried the debugfs again and the error still happens.

I now repeated the block and inode checks.

$ sudo fdisk -l /dev/sde
Disk /dev/sde: 10.91 TiB, 12000138625024 bytes, 23437770752 sectors
Device     Start         End     Sectors  Size Type
/dev/sde1   2048 23437768703 23437766656 10.9T Linux filesystem

The array is a 7-disk raid-6 so 5 data disks.

$ sudo sh -c '(lo=$((23240269256-2048)) ; lo="$((lo*5))" ; lo="$((lo/8))" ; echo "testb $lo 
1" ; debugfs -R "testb $lo 1" /dev/md127)'
testb 14525167005 1
debugfs 1.47.2 (1-Jan-2025)
/dev/md127: Block bitmap checksum does not match bitmap while reading 
allocation bitmaps
testb: Filesystem not open

Does anyone have an idea what the problem is?

This weekend is backup day so I will probably run a full raid 'check' or maybe 
a full array raid6check after that.

TIA

--
Eyal at Home ([email protected])

--
_______________________________________________
users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to