[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

Dietmar Rieder Wed, 26 Jun 2024 21:38:16 -0700

Can  anybody comment on my questions below? Thanks so much in advance....


Am 26. Juni 2024 08:08:39 MESZ schrieb Dietmar Rieder 
<dietmar.rie...@i-med.ac.at>:
>...sending also to the list and Xiubo (were accidentally removed from 
>recipients)...
>
>On 6/25/24 21:28, Dietmar Rieder wrote:
>> Hi Patrick,  Xiubo and List,
>> 
>> finally we managed to get the filesystem repaired and running again! YEAH, 
>> I'm so happy!!
>> 
>> Big thanks for your support Patrick and Xiubo! (Would love invite you for a 
>> beer)!
>> 
>> 
>> Please see some comments and (important?) questions below:
>> 
>> On 6/25/24 03:14, Patrick Donnelly wrote:
>>> On Mon, Jun 24, 2024 at 5:22 PM Dietmar Rieder
>>> <dietmar.rie...@i-med.ac.at> wrote:
>>>> 
>>>> (resending this, the original message seems that it didn't make it through 
>>>> between all the SPAM recently sent to the list, my apologies if it doubles 
>>>> at some point)
>>>> 
>>>> Hi List,
>>>> 
>>>> we are still struggeling to get our cephfs back online again, this is an 
>>>> update to inform you what we did so far, and we kindly ask for any input 
>>>> on this to get an idea on how to proceed:
>>>> 
>>>> After resetting the journals Xiubo suggested (in a PM) to go on with the 
>>>> disaster recovery procedure:
>>>> 
>>>> cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100
>>>> 
>>>> [root@ceph01-b ~]# cephfs-data-scan init
>>>> Inode 0x0x1 already exists, skipping create.  Use --force-init to 
>>>> overwrite the existing object.
>>>> Inode 0x0x100 already exists, skipping create.  Use --force-init to 
>>>> overwrite the existing object.
>>>> 
>>>> We did not use --force-init and proceeded with scan_extents using a single 
>>>> worker, which was indeed very slow.
>>>> 
>>>> After ~24h we interupted the scan_extents and restarted it with 32 workers 
>>>> which went through in about 2h15min w/o any issue.
>>>> 
>>>> Then I started scan_inodes with 32 workers this was also finished after 
>>>> ~50min no output on stderr or stdout.
>>>> 
>>>> I went on with scan_links, which after ~45 minutes threw the following 
>>>> error:
>>>> 
>>>> # cephfs-data-scan scan_links
>>>> Error ((2) No such file or directory)
>>> 
>>> Not sure what this indicates necessarily. You can try to get more
>>> debug information using:
>>> 
>>> [client]
>>>    debug mds = 20
>>>    debug ms = 1
>>>    debug client = 20
>>> 
>>> in the local ceph.conf for the node running cephfs-data-scan.
>> 
>> I did that, and restarted the  "cephfs-data-scan scan_links" .
>> 
>> It didn't produce any additional debug output, however this time it just 
>> went through without error (~50 min)
>> 
>> We then reran "cephfs-data-scan cleanup" and it also finished without error 
>> after about 10h.
>> 
>> We then set the fs as repaired and all seems to work fin again:
>> 
>> [root@ceph01-b ~]# ceph mds repaired 0
>> repaired: restoring rank 1:0
>> 
>> [root@ceph01-b ~]# ceph -s
>>    cluster:
>>      id:     aae23c5c-a98b-11ee-b44d-00620b05cac4
>>      health: HEALTH_OK
>> 
>>    services:
>>      mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 6d)
>>      mgr: cephmon-01.dsxcho(active, since 6d), standbys: cephmon-02.nssigg, 
>> cephmon-03.rgefle
>>      mds: 1/1 daemons up, 5 standby
>>      osd: 336 osds: 336 up (since 2M), 336 in (since 4M)
>> 
>>    data:
>>      volumes: 1/1 healthy
>>      pools:   4 pools, 6401 pgs
>>      objects: 284.68M objects, 623 TiB
>>      usage:   890 TiB used, 3.1 PiB / 3.9 PiB avail
>>      pgs:     6206 active+clean
>>               140  active+clean+scrubbing
>>               55   active+clean+scrubbing+deep
>> 
>>    io:
>>      client:   3.9 MiB/s rd, 84 B/s wr, 482 op/s rd, 1.11k op/s wr
>> 
>> 
>> [root@ceph01-b ~]# ceph fs status
>> cephfs - 0 clients
>> ======
>> RANK  STATE              MDS                ACTIVITY     DNS    INOS DIRS   
>> CAPS
>>   0    active  default.cephmon-03.xcujhz  Reqs:    0 /s   124k  60.3k 1993   
>>    0
>>           POOL            TYPE     USED  AVAIL
>> ssd-rep-metadata-pool  metadata   298G  63.4T
>>    sdd-rep-data-pool      data    10.2T  84.5T
>>     hdd-ec-data-pool      data     808T  1929T
>>         STANDBY MDS
>> default.cephmon-01.cepqjp
>> default.cephmon-01.pvnqad
>> default.cephmon-02.duujba
>> default.cephmon-02.nyfook
>> default.cephmon-03.chjusj
>> MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) 
>> reef (stable)
>> 
>> 
>> The msd log however shows some "bad backtrace on directory inode" messages:
>> 
>> 2024-06-25T18:45:36.575+0000 7f8594659700  1 mds.default.cephmon-03.xcujhz 
>> Updating MDS map to version 8082 from mon.1
>> 2024-06-25T18:45:36.575+0000 7f8594659700  1 mds.0.8082 handle_mds_map i am 
>> now mds.0.8082
>> 2024-06-25T18:45:36.575+0000 7f8594659700  1 mds.0.8082 handle_mds_map state 
>> change up:standby --> up:replay
>> 2024-06-25T18:45:36.575+0000 7f8594659700  1 mds.0.8082 replay_start
>> 2024-06-25T18:45:36.575+0000 7f8594659700  1 mds.0.8082  waiting for osdmap 
>> 34331 (which blocklists prior instance)
>> 2024-06-25T18:45:36.581+0000 7f858de4c700  0 mds.0.cache creating system 
>> inode with ino:0x100
>> 2024-06-25T18:45:36.581+0000 7f858de4c700  0 mds.0.cache creating system 
>> inode with ino:0x1
>> 2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.journal EResetJournal
>> 2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.sessionmap wipe start
>> 2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.sessionmap wipe result
>> 2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.sessionmap wipe done
>> 2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.8082 Finished replaying 
>> journal
>> 2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.8082 making mds journal 
>> writeable
>> 2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.default.cephmon-03.xcujhz 
>> Updating MDS map to version 8083 from mon.1
>> 2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 handle_mds_map i am 
>> now mds.0.8082
>> 2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 handle_mds_map state 
>> change up:replay --> up:reconnect
>> 2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 reconnect_start
>> 2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 reopen_log
>> 2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 reconnect_done
>> 2024-06-25T18:45:38.579+0000 7f8594659700  1 mds.default.cephmon-03.xcujhz 
>> Updating MDS map to version 8084 from mon.1
>> 2024-06-25T18:45:38.579+0000 7f8594659700  1 mds.0.8082 handle_mds_map i am 
>> now mds.0.8082
>> 2024-06-25T18:45:38.579+0000 7f8594659700  1 mds.0.8082 handle_mds_map state 
>> change up:reconnect --> up:rejoin
>> 2024-06-25T18:45:38.579+0000 7f8594659700  1 mds.0.8082 rejoin_start
>> 2024-06-25T18:45:38.583+0000 7f8594659700  1 mds.0.8082 rejoin_joint_start
>> 2024-06-25T18:45:38.592+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] 
>> : bad backtrace on directory inode 0x10003e42340
>> 2024-06-25T18:45:38.680+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] 
>> : bad backtrace on directory inode 0x10003e45d8b
>> 2024-06-25T18:45:38.754+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] 
>> : bad backtrace on directory inode 0x10003e45d90
>> 2024-06-25T18:45:38.754+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] 
>> : bad backtrace on directory inode 0x10003e45d9f
>> 2024-06-25T18:45:38.785+0000 7f858fe50700  1 mds.0.8082 rejoin_done
>> 2024-06-25T18:45:39.582+0000 7f8594659700  1 mds.default.cephmon-03.xcujhz 
>> Updating MDS map to version 8085 from mon.1
>> 2024-06-25T18:45:39.582+0000 7f8594659700  1 mds.0.8082 handle_mds_map i am 
>> now mds.0.8082
>> 2024-06-25T18:45:39.582+0000 7f8594659700  1 mds.0.8082 handle_mds_map state 
>> change up:rejoin --> up:active
>> 2024-06-25T18:45:39.582+0000 7f8594659700  1 mds.0.8082 recovery_done -- 
>> successful recovery!
>> 2024-06-25T18:45:39.584+0000 7f8594659700  1 mds.0.8082 active_start
>> 2024-06-25T18:45:39.585+0000 7f8594659700  1 mds.0.8082 cluster recovered.
>> 2024-06-25T18:45:42.409+0000 7f8591e54700 -1 mds.pinger is_rank_lagging: 
>> rank=0 was never sent ping request.
>> 2024-06-25T18:57:28.213+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] 
>> : bad backtrace on directory inode 0x4
>> 
>> 
>> Is there anything that we can do about this, to get rid of the "bad 
>> backtrace on directory inode"?
>> 
>> 
>> Sone more question:
>> 
>> 1.
>> As Xiubo suggested, we now tried to mount the filesystem with the "nowsysnc" 
>> option <https://tracker.ceph.com/issues/61009#note-26>:
>> 
>> [root@ceph01-b ~]# mount -t ceph cephfs_user@.cephfs=/ /mnt/cephfs -o 
>> secretfile=/etc/ceph/ceph.client.cephfs_user.secret,nowsync
>> 
>> however the option seems not to show up in /proc/mounts
>> 
>> [root@ceph01-b ~]# grep ceph /proc/mounts
>> cephfs_user@aae23c5c-a98b-11ee-b44d-00620b05cac4.cephfs=/ /mnt/cephfs ceph 
>> rw,relatime,name=cephfs_user,secret=<hidden>,ms_mode=prefer-crc,acl,mon_addr=10.1.3.21:3300/10.1.3.22:3300/10.1.3.23:3300
>>  0 0
>> 
>> The kernel version is 5.14.0 (from Rocky 9.3)
>> 
>> [root@ceph01-b ~]# uname -a
>> Linux ceph01-b 5.14.0-362.24.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 
>> 13 17:33:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
>> 
>> Is this expected? How can we make sure that the filesystem uses 'nowsync', 
>> so that we do not hit the bug <https://tracker.ceph.com/issues/61009> again?
>> 
>
>Oh, I think I misunderstood the suggested workaround. I guess we need to 
>disable "nowsync", which is set by default, right?
>
>so: -o wsync
>
>should be the workaround, right?
>
>> 2.
>> There are two empty files in lost+found now. Is ist save to remove them?
>> 
>> [root@ceph01-b lost+found]# ls -la
>> total 0
>> drwxr-xr-x 2 root root 1 Jan  1  1970 .
>> drwxr-xr-x 4 root root 2 Mar 13 21:22 ..
>> -r-x------ 1 root root 0 Jun 20 23:50 100037a50e2
>> -r-x------ 1 root root 0 Jun 20 19:05 200049612e5
>> 
>> 3.
>> Are there any specific steps that we should perform now (scrub or similar 
>> things) before we put the filesystem into production again?
>> 
>
>
>Dietmar
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

Reply via email to