Hello David,

Thank you for the tip about the scrubbing. I have tried the commands found in the documentation but it seems to have no effect:

[root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub start / 
recursive,repair,force*
2025-04-17T12:07:20.958+0200 7fd4157fa640  0 client.86301 ms_handle_reset on 
v2:130.84.80.10:6800/3218663047
2025-04-17T12:07:20.979+0200 7fd4157fa640  0 client.86307 ms_handle_reset on 
v2:130.84.80.10:6800/3218663047
{
    "return_code": 0,
    "scrub_tag": "733b1c6d-a418-4c83-bc8e-b28b556e970c",
    "mode": "asynchronous"
}

[root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub status*
2025-04-17T12:07:30.734+0200 7f26cdffb640  0 client.86319 ms_handle_reset on 
v2:130.84.80.10:6800/3218663047
2025-04-17T12:07:30.753+0200 7f26cdffb640  0 client.86325 ms_handle_reset on 
v2:130.84.80.10:6800/3218663047
{
    "status": "no active scrubs running",
    "scrubs": {}
}
[root@mon-01 ~]# ceph -s
  cluster:
    id:     b87276e0-1d92-11ef-a9d6-507c6f66ae2e
    *health: HEALTH_ERR             1 MDSs report damaged metadata*
  services:
    mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 19h)
    mgr: mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio, 
mon-01.xhxqdi
    mds: 1/1 daemons up, 2 standby
    osd: 368 osds: 368 up (since 18h), 368 in (since 3w)
  data:
    volumes: 1/1 healthy
    pools:   10 pools, 4353 pgs
    objects: 1.25M objects, 3.9 TiB
    usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
    pgs:     4353 active+clean

Did I miss something ?

The server didn't crash. I don't understand what you are meaning by "there may be a design flaw in the infrastructure (insecure cache, for example)".
How to know if we have a design problem ? What should we check ?

Best regards,

Christophe

On 17/04/2025 11:07, David C. wrote:
Hello Christophe,

Check the file system scrubbing procedure => https://docs.ceph.com/en/latest/cephfs/scrub/ But this doesn't guarantee data recovery.

Was the cluster crashed?
Ceph should be able to handle it; there may be a design flaw in the infrastructure (insecure cache, for example).

David

Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA <christophe.dia...@idris.fr> a écrit :

    Hello,

    After an electrical maintenance I restarted our ceph cluster but it
    remains in an unhealthy state: HEALTH_ERR 1 MDSs report damaged
    metadata.

    How to repair this damaged metadata ?

    To bring down the cephfs cluster I unmounted the fs from the client
    first and then did: ceph fs set cfs_irods_test down true

    To bring up the cephfs cluster I did: ceph fs set cfs_irods_test
    down false

    Fortunately the cfs_irods_test fs is almost empty and is a fs for
    tests.The ceph cluster is not in production yet.

    Following is the current status:

    [root@mon-01 ~]# ceph health detail
    HEALTH_ERR 1 MDSs report damaged metadata
    *[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata
         mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage
    detected*

    [root@mon-01 ~]# ceph -s
       cluster:
         id:     b87276e0-1d92-11ef-a9d6-507c6f66ae2e
         health: HEALTH_ERR
                 1 MDSs report damaged metadata

       services:
         mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 17h)
         mgr: mon-02.mqaubn(active, since 17h), standbys: mon-03.gvywio,
    mon-01.xhxqdi
         mds: 1/1 daemons up, 2 standby
         osd: 368 osds: 368 up (since 17h), 368 in (since 3w)

       data:
         volumes: 1/1 healthy
         pools:   10 pools, 4353 pgs
         objects: 1.25M objects, 3.9 TiB
         usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
         pgs:     4353 active+clean


    [root@mon-01 ~]# ceph fs ls
    name: cfs_irods_test, metadata pool: cfs_irods_md_test, data pools:
    [cfs_irods_def_test cfs_irods_data_test ]

    [root@mon-01 ~]# ceph mds stat
    cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2
    up:standby

    [root@mon-01 ~]# ceph fs status
    cfs_irods_test - 0 clients
    ==============
    RANK  STATE                  MDS                    ACTIVITY DNS
    INOS   DIRS   CAPS
      0    active  cfs_irods_test.mon-03.vlmeuz  Reqs:    0 /s 12     15
    14      0
             POOL           TYPE     USED  AVAIL
      cfs_irods_md_test   metadata  11.4M  34.4T
      cfs_irods_def_test    data       0   34.4T
    cfs_irods_data_test    data       0   4542T
                STANDBY MDS
    cfs_irods_test.mon-01.hitdem
    cfs_irods_test.mon-02.awuygq
    MDS version: ceph version 18.2.2
    (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
    [root@mon-01 ~]#

    [root@mon-01 ~]# ceph tell mds.cfs_irods_test:0 damage ls
    2025-04-17T10:23:31.849+0200 7f4b87fff640  0 client.86181
    ms_handle_reset on v2:130.84.80.10:6800/3218663047
    <http://130.84.80.10:6800/3218663047>
    2025-04-17T10:23:31.866+0200 7f4b87fff640  0 client.86187
    ms_handle_reset on v2:130.84.80.10:6800/3218663047
    <http://130.84.80.10:6800/3218663047>
    [
         {
    *"damage_type": "dentry",*
             "id": 241447932,
             "ino": 1,
             "frag": "*",
             "dname": "testdir2",
             "snap_id": "head",
             "path": "/testdir2"
         },
         {
    *"damage_type": "dentry"*,
             "id": 2273238993,
             "ino": 1,
             "frag": "*",
             "dname": "testdir1",
             "snap_id": "head",
             "path": "/testdir1"
         }
    ]
    [root@mon-01 ~]#

    Any help will be appreciated,

    Thanks,

    Christophe
    _______________________________________________
    ceph-users mailing list -- ceph-users@ceph.io
    To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to