If I'm not mistaken, this is a fairly rare situation. The fact that it's the result of a power outage makes me think of a bad SSD (like "S... Pro").
Does a grep of the dentry id in the MDS logs return anything? Maybe some interesting information around this grep In the heat of the moment, I have no other idea than to delete the dentry. ceph tell mds.cfs_irods_test:0 damage rm 241447932 However, in production, this results in the content (of dir /testdir[12]) being abandoned. Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA <christophe.dia...@idris.fr> a écrit : > Hello David, > > Thank you for the tip about the scrubbing. I have tried the commands found > in the documentation but it seems to have no effect: > > [root@mon-01 ~]# *ceph tell mds.cfs_irods_test:0 scrub start / > recursive,repair,force* > 2025-04-17T12:07:20.958+0200 7fd4157fa640 0 client.86301 ms_handle_reset on > v2:130.84.80.10:6800/3218663047 > 2025-04-17T12:07:20.979+0200 > <http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> > 7fd4157fa640 0 client.86307 ms_handle_reset on > v2:130.84.80.10:6800/3218663047 > { > "return_code": 0, > "scrub_tag": "733b1c6d-a418-4c83-bc8e-b28b556e970c", > "mode": "asynchronous" > } > > [root@mon-01 ~]#* ceph tell mds.cfs_irods_test:0 scrub status* > 2025-04-17T12:07:30.734+0200 7f26cdffb640 0 client.86319 ms_handle_reset on > v2:130.84.80.10:6800/3218663047 > 2025-04-17T12:07:30.753+0200 > <http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> > 7f26cdffb640 0 client.86325 ms_handle_reset on > v2:130.84.80.10:6800/3218663047 > { > "status": "no active scrubs running", > "scrubs": {} > } > [root@mon-01 ~]# ceph -s > cluster: > id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e > *health: HEALTH_ERR > 1 MDSs report damaged metadata* > > services: > mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 19h) > mgr: mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio, > mon-01.xhxqdi > mds: 1/1 daemons up, 2 standby > osd: 368 osds: 368 up (since 18h), 368 in (since 3w) > > data: > volumes: 1/1 healthy > pools: 10 pools, 4353 pgs > objects: 1.25M objects, 3.9 TiB > usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail > pgs: 4353 active+clean > > Did I miss something ? > > The server didn't crash. I don't understand what you are meaning by "there > may be a design flaw in the infrastructure (insecure cache, for example)". > How to know if we have a design problem ? What should we check ? > > Best regards, > > Christophe > On 17/04/2025 11:07, David C. wrote: > > Hello Christophe, > > Check the file system scrubbing procedure => > https://docs.ceph.com/en/latest/cephfs/scrub/ But this doesn't guarantee > data recovery. > > Was the cluster crashed? > Ceph should be able to handle it; there may be a design flaw in the > infrastructure (insecure cache, for example). > > David > > Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA < > christophe.dia...@idris.fr> a écrit : > >> Hello, >> >> After an electrical maintenance I restarted our ceph cluster but it >> remains in an unhealthy state: HEALTH_ERR 1 MDSs report damaged metadata. >> >> How to repair this damaged metadata ? >> >> To bring down the cephfs cluster I unmounted the fs from the client >> first and then did: ceph fs set cfs_irods_test down true >> >> To bring up the cephfs cluster I did: ceph fs set cfs_irods_test down >> false >> >> Fortunately the cfs_irods_test fs is almost empty and is a fs for >> tests.The ceph cluster is not in production yet. >> >> Following is the current status: >> >> [root@mon-01 ~]# ceph health detail >> HEALTH_ERR 1 MDSs report damaged metadata >> *[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata >> mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage detected* >> >> [root@mon-01 ~]# ceph -s >> cluster: >> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e >> health: HEALTH_ERR >> 1 MDSs report damaged metadata >> >> services: >> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 17h) >> mgr: mon-02.mqaubn(active, since 17h), standbys: mon-03.gvywio, >> mon-01.xhxqdi >> mds: 1/1 daemons up, 2 standby >> osd: 368 osds: 368 up (since 17h), 368 in (since 3w) >> >> data: >> volumes: 1/1 healthy >> pools: 10 pools, 4353 pgs >> objects: 1.25M objects, 3.9 TiB >> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail >> pgs: 4353 active+clean >> >> >> [root@mon-01 ~]# ceph fs ls >> name: cfs_irods_test, metadata pool: cfs_irods_md_test, data pools: >> [cfs_irods_def_test cfs_irods_data_test ] >> >> [root@mon-01 ~]# ceph mds stat >> cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2 up:standby >> >> [root@mon-01 ~]# ceph fs status >> cfs_irods_test - 0 clients >> ============== >> RANK STATE MDS ACTIVITY DNS >> INOS DIRS CAPS >> 0 active cfs_irods_test.mon-03.vlmeuz Reqs: 0 /s 12 15 >> 14 0 >> POOL TYPE USED AVAIL >> cfs_irods_md_test metadata 11.4M 34.4T >> cfs_irods_def_test data 0 34.4T >> cfs_irods_data_test data 0 4542T >> STANDBY MDS >> cfs_irods_test.mon-01.hitdem >> cfs_irods_test.mon-02.awuygq >> MDS version: ceph version 18.2.2 >> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) >> [root@mon-01 ~]# >> >> [root@mon-01 ~]# ceph tell mds.cfs_irods_test:0 damage ls >> 2025-04-17T10:23:31.849+0200 7f4b87fff640 0 client.86181 >> ms_handle_reset on v2:130.84.80.10:6800/3218663047 >> 2025-04-17T10:23:31.866+0200 7f4b87fff640 0 client.86187 >> ms_handle_reset on v2:130.84.80.10:6800/3218663047 >> [ >> { >> *"damage_type": "dentry",* >> "id": 241447932, >> "ino": 1, >> "frag": "*", >> "dname": "testdir2", >> "snap_id": "head", >> "path": "/testdir2" >> }, >> { >> *"damage_type": "dentry"*, >> "id": 2273238993, >> "ino": 1, >> "frag": "*", >> "dname": "testdir1", >> "snap_id": "head", >> "path": "/testdir1" >> } >> ] >> [root@mon-01 ~]# >> >> Any help will be appreciated, >> >> Thanks, >> >> Christophe >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io