If I'm not mistaken, this is a fairly rare situation.

The fact that it's the result of a power outage makes me think of a bad SSD
(like "S... Pro").

Does a grep of the dentry id in the MDS logs return anything?
Maybe some interesting information around this grep

In the heat of the moment, I have no other idea than to delete the dentry.

ceph tell mds.cfs_irods_test:0 damage rm 241447932

However, in production, this results in the content (of dir /testdir[12])
being abandoned.


Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA <christophe.dia...@idris.fr>
a écrit :

> Hello David,
>
> Thank you for the tip about the scrubbing. I have tried the commands found
> in the documentation but it seems to have no effect:
>
> [root@mon-01 ~]# *ceph tell mds.cfs_irods_test:0 scrub start / 
> recursive,repair,force*
> 2025-04-17T12:07:20.958+0200 7fd4157fa640  0 client.86301 ms_handle_reset on 
> v2:130.84.80.10:6800/3218663047
> 2025-04-17T12:07:20.979+0200 
> <http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> 
> 7fd4157fa640  0 client.86307 ms_handle_reset on 
> v2:130.84.80.10:6800/3218663047
> {
>     "return_code": 0,
>     "scrub_tag": "733b1c6d-a418-4c83-bc8e-b28b556e970c",
>     "mode": "asynchronous"
> }
>
> [root@mon-01 ~]#* ceph tell mds.cfs_irods_test:0 scrub status*
> 2025-04-17T12:07:30.734+0200 7f26cdffb640  0 client.86319 ms_handle_reset on 
> v2:130.84.80.10:6800/3218663047
> 2025-04-17T12:07:30.753+0200 
> <http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> 
> 7f26cdffb640  0 client.86325 ms_handle_reset on 
> v2:130.84.80.10:6800/3218663047
> {
>     "status": "no active scrubs running",
>     "scrubs": {}
> }
> [root@mon-01 ~]# ceph -s
>   cluster:
>     id:     b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>     *health: HEALTH_ERR
>             1 MDSs report damaged metadata*
>
>   services:
>     mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 19h)
>     mgr: mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio, 
> mon-01.xhxqdi
>     mds: 1/1 daemons up, 2 standby
>     osd: 368 osds: 368 up (since 18h), 368 in (since 3w)
>
>   data:
>     volumes: 1/1 healthy
>     pools:   10 pools, 4353 pgs
>     objects: 1.25M objects, 3.9 TiB
>     usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
>     pgs:     4353 active+clean
>
> Did I miss something ?
>
> The server didn't crash. I don't understand what you are meaning by "there
> may be a design flaw in the infrastructure (insecure cache, for example)".
> How to know if we have a design problem ? What should we check ?
>
> Best regards,
>
> Christophe
> On 17/04/2025 11:07, David C. wrote:
>
> Hello Christophe,
>
> Check the file system scrubbing procedure =>
> https://docs.ceph.com/en/latest/cephfs/scrub/ But this doesn't guarantee
> data recovery.
>
> Was the cluster crashed?
> Ceph should be able to handle it; there may be a design flaw in the
> infrastructure (insecure cache, for example).
>
> David
>
> Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA <
> christophe.dia...@idris.fr> a écrit :
>
>> Hello,
>>
>> After an electrical maintenance I restarted our ceph cluster but it
>> remains in an unhealthy state: HEALTH_ERR 1 MDSs report damaged metadata.
>>
>> How to repair this damaged metadata ?
>>
>> To bring down the cephfs cluster I unmounted the fs from the client
>> first and then did: ceph fs set cfs_irods_test down true
>>
>> To bring up the cephfs cluster I did: ceph fs set cfs_irods_test down
>> false
>>
>> Fortunately the cfs_irods_test fs is almost empty and is a fs for
>> tests.The ceph cluster is not in production yet.
>>
>> Following is the current status:
>>
>> [root@mon-01 ~]# ceph health detail
>> HEALTH_ERR 1 MDSs report damaged metadata
>> *[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata
>>      mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage detected*
>>
>> [root@mon-01 ~]# ceph -s
>>    cluster:
>>      id:     b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>>      health: HEALTH_ERR
>>              1 MDSs report damaged metadata
>>
>>    services:
>>      mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 17h)
>>      mgr: mon-02.mqaubn(active, since 17h), standbys: mon-03.gvywio,
>> mon-01.xhxqdi
>>      mds: 1/1 daemons up, 2 standby
>>      osd: 368 osds: 368 up (since 17h), 368 in (since 3w)
>>
>>    data:
>>      volumes: 1/1 healthy
>>      pools:   10 pools, 4353 pgs
>>      objects: 1.25M objects, 3.9 TiB
>>      usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
>>      pgs:     4353 active+clean
>>
>>
>> [root@mon-01 ~]# ceph fs ls
>> name: cfs_irods_test, metadata pool: cfs_irods_md_test, data pools:
>> [cfs_irods_def_test cfs_irods_data_test ]
>>
>> [root@mon-01 ~]# ceph mds stat
>> cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2 up:standby
>>
>> [root@mon-01 ~]# ceph fs status
>> cfs_irods_test - 0 clients
>> ==============
>> RANK  STATE                  MDS                    ACTIVITY DNS
>> INOS   DIRS   CAPS
>>   0    active  cfs_irods_test.mon-03.vlmeuz  Reqs:    0 /s 12     15
>> 14      0
>>          POOL           TYPE     USED  AVAIL
>>   cfs_irods_md_test   metadata  11.4M  34.4T
>>   cfs_irods_def_test    data       0   34.4T
>> cfs_irods_data_test    data       0   4542T
>>             STANDBY MDS
>> cfs_irods_test.mon-01.hitdem
>> cfs_irods_test.mon-02.awuygq
>> MDS version: ceph version 18.2.2
>> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>> [root@mon-01 ~]#
>>
>> [root@mon-01 ~]# ceph tell mds.cfs_irods_test:0 damage ls
>> 2025-04-17T10:23:31.849+0200 7f4b87fff640  0 client.86181
>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>> 2025-04-17T10:23:31.866+0200 7f4b87fff640  0 client.86187
>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>> [
>>      {
>> *"damage_type": "dentry",*
>>          "id": 241447932,
>>          "ino": 1,
>>          "frag": "*",
>>          "dname": "testdir2",
>>          "snap_id": "head",
>>          "path": "/testdir2"
>>      },
>>      {
>> *"damage_type": "dentry"*,
>>          "id": 2273238993,
>>          "ino": 1,
>>          "frag": "*",
>>          "dname": "testdir1",
>>          "snap_id": "head",
>>          "path": "/testdir1"
>>      }
>> ]
>> [root@mon-01 ~]#
>>
>> Any help will be appreciated,
>>
>> Thanks,
>>
>> Christophe
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to