[ceph-users] Re: HEALTH_ERR: 1 MDSs report damaged metadata - damage_type=dentry

Christophe DIARRA Wed, 23 Apr 2025 02:45:40 -0700

Hello Frédéric,

I removed the fs but haven't recreated it yet because I have a doubtabout the health of the cluster even though it seems healthy:


[mon-01 ~]# ceph -s
 cluster:
   id:     b87276e0-1d92-11ef-a9d6-507c6f66ae2e
   health: HEALTH_OK

 services:
   mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 6d)

mgr: mon-02.mqaubn(active, since 6d), standbys: mon-03.gvywio,mon-01.xhxqdi

   osd: 368 osds: 368 up (since 16h), 368 in (since 3w)

 data:
   pools:   10 pools, 4353 pgs
   objects: 1.25M objects, 3.9 TiB
   usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
   pgs:     4353 active+clean

I observed that listing the objects in any hdd pool will hang at thebeginning for and empty hdd pool or after displaying the list of objects.I need to do a Ctrl-C to interrupt the hung 'rados ls' command. I don'thave this problem with the pools on sdd.


[mon-01 ~]# rados lspools
.mgr
pool_rbd_rep3_hdd          <------ hdd pool
pool_rbd_rep3_ssd
rbd_ec_k6m2_hdd <------ hdd pool
rbd_ec_k6m2_ssd
metadata_4hddrbd_rep3_ssd
metadata_4ssdrbd_rep3_ssd
cfs_irods_md_test
cfs_irods_def_test
cfs_irods_data_test <------ hdd pool
[mon-01 ~]#

1) Testing 'rados ls' on hdd pools:

[mon-01 ~]# rados -p cfs_irods_data_test ls
*(hangs forever) ==>**Ctrl-C
*

[mon-01 ~]# rados -p pool_rbd_rep3_hdd ls|head -2
rbd_data.565ed6699dd8.0000000000097ff6
rbd_data.565ed6699dd8.00000000001041fb

*(then hangs forever here) ==> **Ctrl-C*

[mon-01 ~]# rados -p pool_rbd_rep3_hdd ls
rbd_data.565ed6699dd8.0000000000097ff6
rbd_data.565ed6699dd8.00000000001041fb
rbd_data.565ed6699dd8.000000000004f1a3
...
(list truncated by me)
...
rbd_data.565ed6699dd8.000000000016809e
rbd_data.565ed6699dd8.000000000007bc05
*(then hangs forever here) ==> **Ctrl-C*

2) With the pools on ssd everything works well (the 'rados ls' commandsdoesn't hang):

[mon-01 ~]# for i in $(rados lspools|egrep 'ssd|md|def'); do echo -n"Pool $i :"; rados -p $i ls |wc -l; done

Pool pool_rbd_rep3_ssd :197298
Pool rbd_ec_k6m2_ssd :101552
Pool metadata_4hddrbd_rep3_ssd :5
Pool metadata_4ssdrbd_rep3_ssd :5
Pool cfs_irods_md_test :0
Pool cfs_irods_def_test :0

Below is the configuration of the cluster:

- 3 MONs (HPE DL360) + 8 OSD servers (HPE Apollo 4510 gen10)

- each OSD server has 44x20TB HDD + 10x7.6TB SSD

- On each OSD server, 8 SSD are partioned and used for the wal/db of theHDD OSD

- On each OSD server 2 SSD are used for the ceph fs metadata and defaultdata pools.

Do you see any configuration problem here which could lead to ourmetadata problem ?

Do you know what could cause the hang of the 'rados ls' command on theHDD pools ? I would like to understand this problem before recreating annew cephfs fs.

The cluster is still is testing state so we can do any tests you couldrecommend.


Thanks,

Christophe

On 22/04/2025 16:46, Christophe DIARRA wrote:

Hello Frédéric,

15 of the 16 parallel scanning workers terminated almost immediately .But one worker is still running for 1+ hour:


[mon-01 log]# ps -ef|grep scan

root 1977927 1925004 0 15:18 pts/0 00:00:00cephfs-data-scanscan_extents --filesystem cfs_irods_test --worker_n 11--worker_m 16


[mon-01 log]# date;lsof -p 1977927|grep osd
Tue Apr 22 04:37:05 PM CEST 2025

cephfs-da 1977927 root 15u IPv4 7105122 0t0 TCPmon-01:34736->osd-06:6912 (ESTABLISHED)cephfs-da 1977927 root 18u IPv4 7110774 0t0 TCPmon-01:45122->osd-03:ethoscan (ESTABLISHED)cephfs-da 1977927 root 19u IPv4 7105123 0t0 TCPmon-01:58556->osd-07:spg (ESTABLISHED)cephfs-da 1977927 root 20u IPv4 7049672 0t0 TCPmon-01:55064->osd-01:7112 (ESTABLISHED)cephfs-da 1977927 root 21u IPv4 7082598 0t0 TCPmon-01:42120->osd-03-data:6896 (SYN_SENT)

[mon-01 log]#

The filesystem is empty. So I will follow your advice and remove it.After that I will recreate it.

I will redo some proper shutdown and restart of the cluster to checkif the problem reappears with the newly recreated fs.


I will let you know.

Thank you for your help,

Christophe

On 22/04/2025 15:56, Frédéric Nass wrote:

That, is weird for 2 reasons.

The first reason is that the cephfs-data-scan should not run for acouple of hours on empty data pools. I just tried to run it on anempty pool and it doesn't run for more than maybe 10 seconds.

The second reason is that the data pool cfs_irods_def_test should notbe empty, even with if the filesystem tree is. It should at leasthave a few rados objects named after {100,200,400,60x}.00000000 andthe root inode 1.00000000 / 1.00000000.inode unless you removed thefilesystem by running the 'ceph fs rm <filesystem_name>--yes-i-really-mean-it' command which does remove rados objects inthe associated pools.

If it's clear for you that this filesystem should be empty, I'dadvise you to remove it (using the 'ceph fs rm' command), delete anyrados objects in the metadata and data pools, and then recreate thefilesystem.


Regards,
Frédéric.

----- Le 22 Avr 25, à 15:13, Christophe DIARRA<christophe.dia...@idris.fr> a écrit :


    Hello Frédéric,

    I have:

    [mon-01 ~]# rados df | grep -E
    'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'
    POOL_NAME                     USED OBJECTS CLONES   COPIES
     MISSING_ON_PRIMARY  UNFOUND  DEGRADED    RD_OPS       RD
       WR_OPS       WR  USED COMPR  UNDER COMPR
    cfs_irods_data_test           0 B        0       0        0
                      0        0         0         0      0 B
            0      0 B         0 B          0 B
    cfs_irods_def_test            0 B        0       0        0
                      0        0         0         1      0 B
        80200  157 GiB         0 B          0 B
    [mon-01 ~]#

    I will interrupt the current scanning process and rerun it with
    more workers.

    Thanks,

    Christophe


    On 22/04/2025 15:05, Frédéric Nass wrote:

        Hum... Obviously this 'empty' filesystem has way more rados
        objects in the 2 data pools than expected. You should see that
        many objects with:

        rados df | grep -E
        'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'

        If waiting is not an option, you can break the scan_extents
        command, re-run it with multiple workers, and then proceed
        with the next scan (scan_links). Just make sure you run the
        next scan with multiple workers as well.

        Regards,
        Frédéric.

        ----- Le 22 Avr 25, à 14:54, Christophe DIARRA
<christophe.dia...@idris.fr>
<mailto:christophe.dia...@idris.fr> a écrit :

            Hello Frédéric,

            I ran the commands (see below) but the command
            'cephfs-data-scan scan_extents --filesystem
            cfs_irods_test' is not finished yet. It has been running
            for 2+ hours. I didn't run it in parallel because it
            contains empty directories only. According to [1]:
            "scan_extents and scan_inodes commands may take a very
            long time if the data pool contains many files or very
            large files. Now I think I should have run the command in
            parallel. I don't know if it is safe to interrupt it and
            then rerun it with 16 workers.

            On 22/04/2025 12:13, Frédéric Nass wrote:

                Hi Christophe,

                You could but it won't be of any help since the
                journal is empty. What you can do to fix the fs
                metadata is to run the below commands from the
                disaster-recovery-experts documentation [1] in this
                particular order:

                #Prevent access to the fs and set it down.
                ceph fs set cfs_irods_test refuse_client_session true
                ceph fs set cfs_irods_test joinable false
                ceph fs set cfs_irods_test down true

            [mon-01 ~]# ceph fs set cfs_irods_test
            refuse_client_session true
            client(s) blocked from establishing new session(s)

            [mon-01 ~]# ceph fs set cfs_irods_test joinable false
            cfs_irods_test marked not joinable; MDS cannot join as
            newly active.

            [mon-01 ~]# ceph fs set cfs_irods_test down true
            cfs_irods_test marked down.


                # Reset maps and journal
                cephfs-table-tool cfs_irods_test:0 reset session
                cephfs-table-tool cfs_irods_test:0 reset snap
                cephfs-table-tool cfs_irods_test:0 reset inode

            [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset session
            {
               "0": {
                   "data": {},
                   "result": 0
               }
            }

            [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset snap
            Error ((2) No such file or directory)
            2025-04-22T12:29:09.550+0200 7f1d4c03e100 -1 main: Bad
            rank selection: cfs_irods_test:0'

            [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset inode
            Error ((2) No such file or
            directory2025-04-22T12:29:43.880+0200 7f0878a3a100 -1
            main: Bad rank selection: cfs_irods_test:0'
            )

                cephfs-journal-tool --rank cfs_irods_test:0 journal
                reset --force
                cephfs-data-scan init --force-init --filesystem
                cfs_irods_test

            [mon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0
            journal reset --force
            Error ((2) No such file or directory)
            2025-04-22T12:34:42.474+0200 7fe8b3a36100 -1 main:
            Couldn't determine MDS rank.

            [mon-01 ~]# cephfs-data-scan init --force-init
            --filesystem cfs_irods_test
            [mon-01 ~]#


                # Rescan data and fix metadata (leaving the below
                commands commented for information on how to // these
                scan tasks)
                #for i in {0..15} ; do cephfs-data-scan scan_frags
                --filesystem cfs_irods_test --force-corrupt --worker_n
                $i --worker_m 16 & done
                #for i in {0..15} ; do cephfs-data-scan scan_extents
                --filesystem cfs_irods_test --worker_n $i --worker_m
                16 & done
                #for i in {0..15} ; do cephfs-data-scan scan_inodes
                --filesystem cfs_irods_test --force-corrupt --worker_n
                $i --worker_m 16 & done
                #for i in {0..15} ; do cephfs-data-scan scan_links
                --filesystem cfs_irods_test --worker_n $i --worker_m
                16 & done

                cephfs-data-scan scan_frags --filesystem
                cfs_irods_test --force-corrupt

cephfs-data-scan scan_extents --filesystemcfs_irods_test


            [mon-01 ~]# cephfs-data-scan scan_frags --filesystem
            cfs_irods_test --force-corrupt
            [mon-01 ~]# cephfs-data-scan scan_extents --filesystem
            cfs_irods_test *------> still running*

            I don't know how long it will take. Once it will be
            completed I will run the remaining commands.

            Thanks,

            Christophe

                cephfs-data-scan scan_inodes --filesystem
                cfs_irods_test --force-corrupt
                cephfs-data-scan scan_links --filesystem cfs_irods_test
                cephfs-data-scan cleanup --filesystem cfs_irods_test

                #ceph mds repaired 0    <---- should not be necessary

                # Set the fs back online and accessible
                ceph fs set cfs_irods_test down false
                ceph fs set cfs_irods_test joinable true
                ceph fs set cfs_irods_test refuse_client_session false

                An MDS should now start, if not then use 'ceph orch
                daemon restart mds.xxxxx' to start a MDS. After
                remounting the fs you should be able to access
                /testdir1 and /testdir2 in the fs root.

                # scrub the fs again to check that if everything is OK.
                ceph tell mds.cfs_irods_test:0 scrub start /
                recursive,repair,force

                Regards,
                Frédéric.

                [1]
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

                ----- Le 22 Avr 25, à 10:21, Christophe DIARRA
<christophe.dia...@idris.fr>
<mailto:christophe.dia...@idris.fr> a écrit :

                    Hello Frédéric,

                    Thank your for your help.

                    Following is output you asked for:

                    [mon-01 ~]# date
                    Tue Apr 22 10:09:10 AM CEST 2025
                    [root@fidrcmon-01 ~]# ceph tell
                    mds.cfs_irods_test:0 scrub start /
                    recursive,repair,force
                    2025-04-22T10:09:12.796+0200 7f43f6ffd640  0
                    client.86553 ms_handle_reset on
                    v2:130.84.80.10:6800/3218663047
                    2025-04-22T10:09:12.818+0200 7f43f6ffd640  0
                    client.86559 ms_handle_reset on
                    v2:130.84.80.10:6800/3218663047
                    {
                       "return_code": 0,
                       "scrub_tag":
                    "12e537bb-bb39-4f3b-ae09-e0a1ae6ce906",
                       "mode": "asynchronous"
                    }
                    [root@fidrcmon-01 ~]# ceph tell
                    mds.cfs_irods_test:0 scrub status
                    2025-04-22T10:09:31.760+0200 7f3f0f7fe640  0
                    client.86571 ms_handle_reset on
                    v2:130.84.80.10:6800/3218663047
                    2025-04-22T10:09:31.781+0200 7f3f0f7fe640  0
                    client.86577 ms_handle_reset on
                    v2:130.84.80.10:6800/3218663047
                    {
                       "status": "no active scrubs running",
                       "scrubs": {}
                    }
                    [root@fidrcmon-01 ~]# cephfs-journal-tool --rank
                    cfs_irods_test:0 event recover_dentries list
                    2025-04-16T18:24:56.802960+0200 0x7c334a
                    SUBTREEMAP:  ()
                    [root@fidrcmon-01 ~]#

                    Based on this output, can I run the other three
                    commands provided in your message :

                    ceph tell mds.0 flush journal
                    ceph mds fail 0

ceph tell mds.cfs_irods_test:0 scrub start /recursive


                    Thanks,

                    Christophe

                    On 19/04/2025 12:55, Frédéric Nass wrote:

                        Hi Christophe, Hi David,

Could you share the ouptut of the belowcommand after running the scrubbing with recursive,repair,force?

cephfs-journal-tool --rank cfs_irods_test:0event recover_dentries list

Could be that the MDS recovered these 2dentries in its journal already but the status of the filesystem wasnot updated yet. I've seen this happening before. If that the case, you could try a flush, failand re-scrub:


                        ceph tell mds.0 flush journal
                        ceph mds fail 0

ceph tell mds.cfs_irods_test:0 scrub start /recursive

This might clear the HEALTH_ERR. If not, thenit will be easy to fix by rebuilding / fixing the metadata from thedata pools since this fs is empty.


                        Let us know,

                        Regards,
                        Frédéric.

----- Le 18 Avr 25, à 9:51,daviddavid.cas...@aevoo.fr a écrit :

I also tend to think that the disk hasnothing to do with the problem.

My reading is that the inode associatedwith the dentry is missing.

                            Can anyone correct me?

Christophe informed me that thedirectories were emptied before the

                            incident.

I don't understand why scrubbing doesn'trepair the meta data.

                            Perhaps because the directory is empty ?

Le jeu. 17 avr. 2025 à 19:06, AnthonyD'Atri<anthony.da...@gmail.com> <mailto:anthony.da...@gmail.com> a

                            écrit :

HPE rebadges drives frommanufacturers. A quick search supports the idea that this SKU is fulfilled at leastpartly by Kioxia, so not likely a PLP

                                issue.

On Apr 17, 2025, at 11:39 AM,Christophe DIARRA <


christophe.dia...@idris.fr> wrote:

                                    Hello David,

                                    The SSD model is VO007680JWZJL.

I will delay the 'ceph tellmds.cfs_irods_test:0 damage rm 241447932'

for the moment. If any other solutionis found I will be obliged to use

                                this command.

I found 'dentry' in the logs whenthe cephfs cluster started:

Apr 16 17:29:53 mon-02ceph-mds[2367]: mds.cfs_irods_test.mon-02.awuygq

Updating MDS map to version 15613from mon.2

Apr 16 17:29:53 mon-02ceph-mds[2367]: mds.0.15612 handle_mds_map i am


                                now mds.0.15612

Apr 16 17:29:53 mon-02ceph-mds[2367]: mds.0.15612 handle_mds_map state


                                change up:starting --> up:active

Apr 16 17:29:53 mon-02ceph-mds[2367]: mds.0.15612 active_start Apr 16 17:29:53 mon-02ceph-mds[2367]: mds.0.cache.den(0x1 testdir2)

loaded already *corrupt dentry*:[dentry #0x1/testdir2 [2,head]rep@0.0

                                NULL (dversion lock) pv=0 v=4442 ino=(n

                                        il) state=0 0x5617e18c8280]

Apr 16 17:29:53 mon-02ceph-mds[2367]: mds.0.cache.den(0x1 testdir1)

loaded already *corrupt dentry*:[dentry #0x1/testdir1 [2,head]rep@0.0

                                NULL (dversion lock) pv=0 v=4442 ino=(n

                                        il) state=0 0x5617e18c8500]

Apr 16 17:29:53 mon-02ceph-mon[2288]: Health check failed: 1


                                filesystem is offline (MDS_ALL_DOWN)

Apr 16 17:29:53 mon-02ceph-mon[2288]: Health check failed: 1

filesystem is online with fewer MDSthan max_mds (MDS_UP_LESS_THAN_MAX)

Apr 16 17:29:53 mon-02ceph-mon[2288]: from='client.?

xx.xx.xx.8:0/3820885518'entity='client.admin' cmd='[{"prefix": "fs set", "fs_name": "cfs_irods_test", "var":"down", "val":


                                        "false"}]': finished

Apr 16 17:29:53 mon-02ceph-mon[2288]: daemon

mds.cfs_irods_test.mon-02.awuygqassigned to filesystem cfs_irods_test as

                                rank 0 (now has 1 ranks)

Apr 16 17:29:53 mon-02ceph-mon[2288]: Health check cleared:

MDS_ALL_DOWN (was: 1 filesystem isoffline)

Apr 16 17:29:53 mon-02ceph-mon[2288]: Health check cleared:

MDS_UP_LESS_THAN_MAX (was: 1filesystem is online with fewer MDS than

                                max_mds)

Apr 16 17:29:53 mon-02ceph-mon[2288]: daemon

mds.cfs_irods_test.mon-02.awuygq isnow active in filesystem cfs_irods_test

                                as rank 0

Apr 16 17:29:54 mon-02ceph-mgr[2444]: log_channel(cluster) log [DBG] :

pgmap v1721: 4353 pgs: 4346active+clean, 7 active+clean+scrubbing+deep;

                                3.9 TiB data, 417 TiB used, 6.4 P

iB / 6.8 PiB avail; 1.4 KiB/srd, 1 op/s

If you need more extract from thelog file please let me know.


                                    Thanks for your help,

                                    Christophe



                                    On 17/04/2025 13:39, David C. wrote:

If I'm not mistaken, this isa fairly rare situation.

The fact that it's the resultof a power outage makes me think of a bad


                                SSD (like "S... Pro").

Does a grep of the dentry idin the MDS logs return anything? Maybe some interestinginformation around this grep

In the heat of the moment, Ihave no other idea than to delete the


                                dentry.

ceph tellmds.cfs_irods_test:0 damage rm 241447932

However, in production, thisresults in the content (of dir


                                /testdir[12]) being abandoned.

Le jeu. 17 avr. 2025 à 12:44,Christophe DIARRA <


christophe.dia...@idris.fr> a écrit :

                                            Hello David,

Thank you for the tipabout the scrubbing. I have tried the commands found in thedocumentation but it seems to have no effect:

[root@mon-01 ~]#*cephtell mds.cfs_irods_test:0 scrub start /


                                recursive,repair,force*

2025-04-17T12:07:20.958+0200 7fd4157fa640  0 client.86301

ms_handle_reset onv2:130.84.80.10:6800/3218663047

                                2025-04-17T12:07:20.979+0200<

http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200><http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> 7fd4157fa640 0 client.86307ms_handle_reset on v2:130.84.80.10:6800/3218663047<http://130.84.80.10:6800/3218663047><http://130.84.80.10:6800/3218663047>


                                            {
                                                 "return_code": 0,

"scrub_tag":"733b1c6d-a418-4c83-bc8e-b28b556e970c",

                                                 "mode": "asynchronous"
                                            }

[root@mon-01 ~]#*cephtell mds.cfs_irods_test:0 scrub status*

2025-04-17T12:07:30.734+0200 7f26cdffb640  0 client.86319

ms_handle_reset onv2:130.84.80.10:6800/3218663047

                                2025-04-17T12:07:30.753+0200<

http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200><http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> 7f26cdffb640 0 client.86325ms_handle_reset on v2:130.84.80.10:6800/3218663047<http://130.84.80.10:6800/3218663047><http://130.84.80.10:6800/3218663047>

"status": "no activescrubs running",

                                                 "scrubs": {}
                                            }
                                            [root@mon-01 ~]# ceph -s
                                               cluster:

id:b87276e0-1d92-11ef-a9d6-507c6f66ae2e *health:HEALTH_ERR 1 MDSs report damaged metadata*

                                                     services:

mon: 3 daemons,quorum mon-01,mon-03,mon-02 (age 19h) mgr:mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio,


                                mon-01.xhxqdi

mds: 1/1 daemons up,2 standby osd: 368 osds: 368up (since 18h), 368 in (since 3w)

                                                     data:
                                                 volumes: 1/1 healthy

pools: 10 pools,4353 pgs objects: 1.25Mobjects, 3.9 TiB usage: 417 TiBused, 6.4 PiB / 6.8 PiB avail pgs: 4353active+clean


                                            Did I miss something ?

The server didn't crash.I don't understand what you are meaning by "there may be a designflaw in the infrastructure (insecure

                                            cache, for example)".

How to know if we have adesign problem ? What should we check ?


                                            Best regards,

                                            Christophe

On 17/04/2025 11:07,David C. wrote:


                                                Hello Christophe,

Check the file systemscrubbing procedure =>

https://docs.ceph.com/en/latest/cephfs/scrub/ But this doesn't
                                                guarantee data recovery.

                                                Was the cluster crashed?

Ceph should be ableto handle it; there may be a design flaw in the infrastructure(insecure cache, for example).


                                                David

Le jeu. 17 avr. 2025à 10:44, Christophe DIARRA<christophe.dia...@idris.fr> <mailto:christophe.dia...@idris.fr> aécrit :


                                                    Hello,

After anelectrical maintenance I restarted our ceph cluster

                                                    but it

remains in anunhealthy state: HEALTH_ERR 1 MDSs report

                                                    damaged metadata.

How to repairthis damaged metadata ?

To bring down thecephfs cluster I unmounted the fs from the

                                                    client

first and thendid: ceph fs set cfs_irods_test down true

To bring up thecephfs cluster I did: ceph fs set

cfs_irods_test down false

Fortunately thecfs_irods_test fs is almost empty and is a fs

for

tests.The cephcluster is not in production yet.

Following is thecurrent status:

[root@mon-01 ~]#ceph health detail HEALTH_ERR 1 MDSsreport damaged metadata *[ERR]MDS_DAMAGE: 1 MDSs report damaged metadata

mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage
                                                    detected*

[root@mon-01 ~]#ceph -s

                                                       cluster:

id:b87276e0-1d92-11ef-a9d6-507c6f66ae2e health:HEALTH_ERR

1 MDSs report damaged metadata

                                                       services:

mon: 3daemons, quorum mon-01,mon-03,mon-02 (age 17h) mgr:mon-02.mqaubn(active, since 17h), standbys:

mon-03.gvywio,
mon-01.xhxqdi

mds: 1/1daemons up, 2 standby osd: 368osds: 368 up (since 17h), 368 in (since 3w)


                                                       data:
volumes: 1/1 healthy
pools:   10 pools, 4353 pgs
objects: 1.25M objects, 3.9 TiB
usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
pgs:     4353 active+clean

[root@mon-01 ~]#ceph fs ls name:cfs_irods_test, metadata pool: cfs_irods_md_test, data

                                                    pools:
[cfs_irods_def_test cfs_irods_data_test ]

[root@mon-01 ~]#ceph mds stat

cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2
                                                    up:standby

[root@mon-01 ~]#ceph fs status

cfs_irods_test - 0 clients
==============

RANK STATEMDS ACTIVITY DNS

                                                    INOS DIRS   CAPS

0 active cfs_irods_test.mon-03.vlmeuz Reqs: 0 /s

                                                    12     15
                                                    14      0
POOL           TYPE     USED  AVAIL
cfs_irods_md_test   metadata  11.4M  34.4T
cfs_irods_def_test    data       0   34.4T
cfs_irods_data_test    data       0   4542T
STANDBY MDS
cfs_irods_test.mon-01.hitdem
cfs_irods_test.mon-02.awuygq

MDS version: cephversion 18.2.2

(531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
                                                    [root@mon-01 ~]#

[root@mon-01 ~]#ceph tell mds.cfs_irods_test:0 damage ls

2025-04-17T10:23:31.849+0200 7f4b87fff640  0 client.86181
ms_handle_reset on v2:130.84.80.10:6800/3218663047

<http://130.84.80.10:6800/3218663047><http://130.84.80.10:6800/3218663047>

2025-04-17T10:23:31.866+0200 7f4b87fff640  0 client.86187
ms_handle_reset on v2:130.84.80.10:6800/3218663047

<http://130.84.80.10:6800/3218663047><http://130.84.80.10:6800/3218663047>

                                                    [
                                                         {
*"damage_type": "dentry",*
"id": 241447932,
"ino": 1,
"frag": "*",
"dname": "testdir2",
"snap_id": "head",
"path": "/testdir2"
                                                         },
                                                         {
*"damage_type": "dentry"*,
"id": 2273238993,
"ino": 1,
"frag": "*",
"dname": "testdir1",
"snap_id": "head",
"path": "/testdir1"
                                                         }
                                                    ]
                                                    [root@mon-01 ~]#

Any help will beappreciated,


                                                    Thanks,

                                                    Christophe
_______________________________________________

ceph-usersmailing list --ceph-users@ceph.io To unsubscribesend an email toceph-users-le...@ceph.io


_______________________________________________

ceph-users mailing list--ceph-users@ceph.io To unsubscribe send an emailtoceph-users-le...@ceph.io


_______________________________________________
                            ceph-users mailing list --ceph-users@ceph.io

To unsubscribe send an emailtoceph-users-le...@ceph.io


_______________________________________________
                        ceph-users mailing list --ceph-users@ceph.io

To unsubscribe send an emailtoceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: HEALTH_ERR: 1 MDSs report damaged metadata - damage_type=dentry

Reply via email to