Hello Frédéric,

I ran the commands (see below) but the command 'cephfs-data-scan scan_extents --filesystem cfs_irods_test' is not finished yet. It has been running for 2+ hours. I didn't run it in parallel because it contains empty directories only. According to [1]: "scan_extents and scan_inodes commands may take a very long time if the data pool contains many files or very large files. Now I think I should have run the command in parallel. I don't know if it is safe to interrupt it and then rerun it with 16 workers.

On 22/04/2025 12:13, Frédéric Nass wrote:
Hi Christophe,

You could but it won't be of any help since the journal is empty. What you can do to fix the fs metadata is to run the below commands from the disaster-recovery-experts documentation [1] in this particular order:

#Prevent access to the fs and set it down.
ceph fs set cfs_irods_test refuse_client_session true
ceph fs set cfs_irods_test joinable false
ceph fs set cfs_irods_test down true
[mon-01 ~]# ceph fs set cfs_irods_test refuse_client_session true
client(s) blocked from establishing new session(s)

[mon-01 ~]# ceph fs set cfs_irods_test joinable false
cfs_irods_test marked not joinable; MDS cannot join as newly active.

[mon-01 ~]# ceph fs set cfs_irods_test down true
cfs_irods_test marked down.


# Reset maps and journal
cephfs-table-tool cfs_irods_test:0 reset session
cephfs-table-tool cfs_irods_test:0 reset snap
cephfs-table-tool cfs_irods_test:0 reset inode

[mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset session
{
   "0": {
       "data": {},
       "result": 0
   }
}

[mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset snap
Error ((2) No such file or directory)
2025-04-22T12:29:09.550+0200 7f1d4c03e100 -1 main: Bad rank selection: cfs_irods_test:0'

[mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset inode
Error ((2) No such file or directory2025-04-22T12:29:43.880+0200 7f0878a3a100 -1 main: Bad rank selection: cfs_irods_test:0'
)

cephfs-journal-tool --rank cfs_irods_test:0 journal reset --force
cephfs-data-scan init --force-init --filesystem cfs_irods_test

[mon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0 journal reset --force
Error ((2) No such file or directory)
2025-04-22T12:34:42.474+0200 7fe8b3a36100 -1 main: Couldn't determine MDS rank.

[mon-01 ~]# cephfs-data-scan init --force-init --filesystem cfs_irods_test
[mon-01 ~]#


# Rescan data and fix metadata (leaving the below commands commented for information on how to // these scan tasks) #for i in {0..15} ; do cephfs-data-scan scan_frags --filesystem cfs_irods_test --force-corrupt --worker_n $i --worker_m 16 & done #for i in {0..15} ; do cephfs-data-scan scan_extents --filesystem cfs_irods_test --worker_n $i --worker_m 16 & done #for i in {0..15} ; do cephfs-data-scan scan_inodes --filesystem cfs_irods_test --force-corrupt --worker_n $i --worker_m 16 & done #for i in {0..15} ; do cephfs-data-scan scan_links --filesystem cfs_irods_test --worker_n $i --worker_m 16 & done

cephfs-data-scan scan_frags --filesystem cfs_irods_test --force-corrupt
cephfs-data-scan scan_extents --filesystem cfs_irods_test

[mon-01 ~]# cephfs-data-scan scan_frags --filesystem cfs_irods_test --force-corrupt [mon-01 ~]# cephfs-data-scan scan_extents --filesystem cfs_irods_test *------> still running*

I don't know how long it will take. Once it will be completed I will run the remaining commands.

Thanks,

Christophe

cephfs-data-scan scan_inodes --filesystem cfs_irods_test --force-corrupt
cephfs-data-scan scan_links --filesystem cfs_irods_test
cephfs-data-scan cleanup --filesystem cfs_irods_test

#ceph mds repaired 0    <---- should not be necessary

# Set the fs back online and accessible
ceph fs set cfs_irods_test down false
ceph fs set cfs_irods_test joinable true
ceph fs set cfs_irods_test refuse_client_session false

An MDS should now start, if not then use 'ceph orch daemon restart mds.xxxxx' to start a MDS. After remounting the fs you should be able to access /testdir1 and /testdir2 in the fs root.

# scrub the fs again to check that if everything is OK.
ceph tell mds.cfs_irods_test:0 scrub start / recursive,repair,force

Regards,
Frédéric.

[1] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

----- Le 22 Avr 25, à 10:21, Christophe DIARRA <christophe.dia...@idris.fr> a écrit :

    Hello Frédéric,

    Thank your for your help.

    Following is output you asked for:

    [root@fidrcmon-01 ~]# date
    Tue Apr 22 10:09:10 AM CEST 2025
    [root@fidrcmon-01 ~]# ceph tell mds.cfs_irods_test:0 scrub start /
    recursive,repair,force
    2025-04-22T10:09:12.796+0200 7f43f6ffd640  0 client.86553
    ms_handle_reset on v2:130.84.80.10:6800/3218663047
    2025-04-22T10:09:12.818+0200 7f43f6ffd640  0 client.86559
    ms_handle_reset on v2:130.84.80.10:6800/3218663047
    {
       "return_code": 0,
       "scrub_tag": "12e537bb-bb39-4f3b-ae09-e0a1ae6ce906",
       "mode": "asynchronous"
    }
    [root@fidrcmon-01 ~]# ceph tell mds.cfs_irods_test:0 scrub status
    2025-04-22T10:09:31.760+0200 7f3f0f7fe640  0 client.86571
    ms_handle_reset on v2:130.84.80.10:6800/3218663047
    2025-04-22T10:09:31.781+0200 7f3f0f7fe640  0 client.86577
    ms_handle_reset on v2:130.84.80.10:6800/3218663047
    {
       "status": "no active scrubs running",
       "scrubs": {}
    }
    [root@fidrcmon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0
    event recover_dentries list
    2025-04-16T18:24:56.802960+0200 0x7c334a SUBTREEMAP:  ()
    [root@fidrcmon-01 ~]#

    Based on this output, can I run the other three commands provided
    in your message :

    ceph tell mds.0 flush journal
    ceph mds fail 0
    ceph tell mds.cfs_irods_test:0 scrub start / recursive

    Thanks,

    Christophe

    On 19/04/2025 12:55, Frédéric Nass wrote:

        Hi Christophe, Hi David,

        Could you share the ouptut of the below command after running the 
scrubbing with recursive,repair,force?

        cephfs-journal-tool --rank cfs_irods_test:0 event recover_dentries list

        Could be that the MDS recovered these 2 dentries in its journal already 
but the status of the filesystem was not updated yet. I've seen this happening 
before.
        If that the case, you could try a flush, fail and re-scrub:

        ceph tell mds.0 flush journal
        ceph mds fail 0
        ceph tell mds.cfs_irods_test:0 scrub start / recursive

        This might clear the HEALTH_ERR. If not, then it will be easy to fix by 
rebuilding / fixing the metadata from the data pools since this fs is empty.

        Let us know,

        Regards,
        Frédéric.

        ----- Le 18 Avr 25, à 9:51, daviddavid.cas...@aevoo.fr a écrit :

            I also tend to think that the disk has nothing to do with the 
problem.

            My reading is that the inode associated with the dentry is missing.
            Can anyone correct me?

            Christophe informed me that the directories were emptied before the
            incident.

            I don't understand why scrubbing doesn't repair the meta data.
            Perhaps because the directory is empty ?

            Le jeu. 17 avr. 2025 à 19:06, Anthony D'Atri<anthony.da...@gmail.com> 
<mailto:anthony.da...@gmail.com> a
            écrit :

                HPE rebadges drives from manufacturers.  A quick search 
supports the idea
                that this SKU is fulfilled at least partly by Kioxia, so not 
likely a PLP
                issue.


                    On Apr 17, 2025, at 11:39 AM, Christophe DIARRA <

                christophe.dia...@idris.fr> wrote:

                    Hello David,

                    The SSD model is VO007680JWZJL.

                    I will delay the 'ceph tell mds.cfs_irods_test:0 damage rm 
241447932'

                for the moment. If any other solution is found I will be 
obliged to use
                this command.

                    I found 'dentry' in the logs when the cephfs cluster 
started:

                        Apr 16 17:29:53 mon-02 ceph-mds[2367]: 
mds.cfs_irods_test.mon-02.awuygq

                Updating MDS map to version 15613 from mon.2

                        Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 
handle_mds_map i am

                now mds.0.15612

                        Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 
handle_mds_map state

                change up:starting --> up:active

                        Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 
active_start
                        Apr 16 17:29:53 mon-02 ceph-mds[2367]: 
mds.0.cache.den(0x1 testdir2)

                loaded already *corrupt dentry*: [dentry #0x1/testdir2 
[2,head]rep@0.0
                NULL (dversion lock) pv=0 v=4442 ino=(n

                        il) state=0 0x5617e18c8280]
                        Apr 16 17:29:53 mon-02 ceph-mds[2367]: 
mds.0.cache.den(0x1 testdir1)

                loaded already *corrupt dentry*: [dentry #0x1/testdir1 
[2,head]rep@0.0
                NULL (dversion lock) pv=0 v=4442 ino=(n

                        il) state=0 0x5617e18c8500]
                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check 
failed: 1

                filesystem is offline (MDS_ALL_DOWN)

                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check 
failed: 1

                filesystem is online with fewer MDS than max_mds 
(MDS_UP_LESS_THAN_MAX)

                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: from='client.?

                xx.xx.xx.8:0/3820885518' entity='client.admin' cmd='[{"prefix": "fs 
set",
                "fs_name": "cfs_irods_test", "var": "down", "val":

                        "false"}]': finished
                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon

                mds.cfs_irods_test.mon-02.awuygq assigned to filesystem 
cfs_irods_test as
                rank 0 (now has 1 ranks)

                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check 
cleared:

                MDS_ALL_DOWN (was: 1 filesystem is offline)

                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check 
cleared:

                MDS_UP_LESS_THAN_MAX (was: 1 filesystem is online with fewer 
MDS than
                max_mds)

                        Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon

                mds.cfs_irods_test.mon-02.awuygq is now active in filesystem 
cfs_irods_test
                as rank 0

                        Apr 16 17:29:54 mon-02 ceph-mgr[2444]: 
log_channel(cluster) log [DBG] :

                pgmap v1721: 4353 pgs: 4346 active+clean, 7 
active+clean+scrubbing+deep;
                3.9 TiB data, 417 TiB used, 6.4 P

                        iB / 6.8 PiB avail; 1.4 KiB/s rd, 1 op/s

                    If you need more extract from the log file please let me 
know.

                    Thanks for your help,

                    Christophe



                    On 17/04/2025 13:39, David C. wrote:

                        If I'm not mistaken, this is a fairly rare situation.

                        The fact that it's the result of a power outage makes 
me think of a bad

                SSD (like "S... Pro").

                        Does a grep of the dentry id in the MDS logs return 
anything?
                        Maybe some interesting information around this grep

                        In the heat of the moment, I have no other idea than to 
delete the

                dentry.

                        ceph tell mds.cfs_irods_test:0 damage rm 241447932

                        However, in production, this results in the content (of 
dir

                /testdir[12]) being abandoned.

                        Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA <

                christophe.dia...@idris.fr> a écrit :

                            Hello David,

                            Thank you for the tip about the scrubbing. I have 
tried the
                            commands found in the documentation but it seems to 
have no effect:

                            [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 
scrub start /

                recursive,repair,force*

                            2025-04-17T12:07:20.958+0200 7fd4157fa640  0 
client.86301

                ms_handle_reset on v2:130.84.80.10:6800/3218663047
                2025-04-17T12:07:20.979+0200<
                http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> 
<http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200>
                7fd4157fa640  0 client.86307 ms_handle_reset on v2:
                130.84.80.10:6800/3218663047<http://130.84.80.10:6800/3218663047> 
<http://130.84.80.10:6800/3218663047>

                            {
                                 "return_code": 0,
                                 "scrub_tag": 
"733b1c6d-a418-4c83-bc8e-b28b556e970c",
                                 "mode": "asynchronous"
                            }

                            [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 
scrub status*
                            2025-04-17T12:07:30.734+0200 7f26cdffb640  0 
client.86319

                ms_handle_reset on v2:130.84.80.10:6800/3218663047
                2025-04-17T12:07:30.753+0200<
                http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> 
<http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200>
                7f26cdffb640  0 client.86325 ms_handle_reset on v2:
                130.84.80.10:6800/3218663047<http://130.84.80.10:6800/3218663047> 
<http://130.84.80.10:6800/3218663047>

                            {
                                 "status": "no active scrubs running",
                                 "scrubs": {}
                            }
                            [root@mon-01 ~]# ceph -s
                               cluster:
                                 id:     b87276e0-1d92-11ef-a9d6-507c6f66ae2e
                                 *health: HEALTH_ERR             1 MDSs report 
damaged metadata*
                                     services:
                                 mon: 3 daemons, quorum mon-01,mon-03,mon-02 
(age 19h)
                                 mgr: mon-02.mqaubn(active, since 19h), 
standbys: mon-03.gvywio,

                mon-01.xhxqdi

                                 mds: 1/1 daemons up, 2 standby
                                 osd: 368 osds: 368 up (since 18h), 368 in 
(since 3w)
                                     data:
                                 volumes: 1/1 healthy
                                 pools:   10 pools, 4353 pgs
                                 objects: 1.25M objects, 3.9 TiB
                                 usage:   417 TiB used, 6.4 PiB / 6.8 PiB avail
                                 pgs:     4353 active+clean

                            Did I miss something ?

                            The server didn't crash. I don't understand what 
you are meaning
                            by "there may be a design flaw in the 
infrastructure (insecure
                            cache, for example)".
                            How to know if we have a design problem ? What 
should we check ?

                            Best regards,

                            Christophe

                            On 17/04/2025 11:07, David C. wrote:

                                Hello Christophe,

                                Check the file system scrubbing procedure =>
                                https://docs.ceph.com/en/latest/cephfs/scrub/ 
But this doesn't
                                guarantee data recovery.

                                Was the cluster crashed?
                                Ceph should be able to handle it; there may be 
a design flaw in
                                the infrastructure (insecure cache, for 
example).

                                David

                                Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA
                                <christophe.dia...@idris.fr> 
<mailto:christophe.dia...@idris.fr> a écrit :

                                    Hello,

                                    After an electrical maintenance I restarted 
our ceph cluster
                                    but it
                                    remains in an unhealthy state: HEALTH_ERR 1 
MDSs report
                                    damaged metadata.

                                    How to repair this damaged metadata ?

                                    To bring down the cephfs cluster I 
unmounted the fs from the
                                    client
                                    first and then did: ceph fs set 
cfs_irods_test down true

                                    To bring up the cephfs cluster I did: ceph 
fs set
                                    cfs_irods_test down false

                                    Fortunately the cfs_irods_test fs is almost 
empty and is a fs
                                    for
                                    tests.The ceph cluster is not in production 
yet.

                                    Following is the current status:

                                    [root@mon-01 ~]# ceph health detail
                                    HEALTH_ERR 1 MDSs report damaged metadata
                                    *[ERR] MDS_DAMAGE: 1 MDSs report damaged 
metadata
                                         
mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage
                                    detected*

                                    [root@mon-01 ~]# ceph -s
                                       cluster:
                                         id:     
b87276e0-1d92-11ef-a9d6-507c6f66ae2e
                                         health: HEALTH_ERR
                                                 1 MDSs report damaged metadata

                                       services:
                                         mon: 3 daemons, quorum 
mon-01,mon-03,mon-02 (age 17h)
                                         mgr: mon-02.mqaubn(active, since 17h), 
standbys:
                                    mon-03.gvywio,
                                    mon-01.xhxqdi
                                         mds: 1/1 daemons up, 2 standby
                                         osd: 368 osds: 368 up (since 17h), 368 
in (since 3w)

                                       data:
                                         volumes: 1/1 healthy
                                         pools:   10 pools, 4353 pgs
                                         objects: 1.25M objects, 3.9 TiB
                                         usage:   417 TiB used, 6.4 PiB / 6.8 
PiB avail
                                         pgs:     4353 active+clean


                                    [root@mon-01 ~]# ceph fs ls
                                    name: cfs_irods_test, metadata pool: 
cfs_irods_md_test, data
                                    pools:
                                    [cfs_irods_def_test cfs_irods_data_test ]

                                    [root@mon-01 ~]# ceph mds stat
                                    cfs_irods_test:1 
{0=cfs_irods_test.mon-03.vlmeuz=up:active} 2
                                    up:standby

                                    [root@mon-01 ~]# ceph fs status
                                    cfs_irods_test - 0 clients
                                    ==============
                                    RANK  STATE MDS                    ACTIVITY 
DNS
                                    INOS   DIRS   CAPS
                                      0    active  cfs_irods_test.mon-03.vlmeuz 
Reqs:    0 /s
                                    12     15
                                    14      0
                                             POOL           TYPE     USED  AVAIL
                                      cfs_irods_md_test   metadata  11.4M  34.4T
                                      cfs_irods_def_test    data       0   34.4T
                                    cfs_irods_data_test    data       0   4542T
                                                STANDBY MDS
                                    cfs_irods_test.mon-01.hitdem
                                    cfs_irods_test.mon-02.awuygq
                                    MDS version: ceph version 18.2.2
                                    (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) 
reef (stable)
                                    [root@mon-01 ~]#

                                    [root@mon-01 ~]# ceph tell 
mds.cfs_irods_test:0 damage ls
                                    2025-04-17T10:23:31.849+0200 7f4b87fff640  
0 client.86181
                                    ms_handle_reset on 
v2:130.84.80.10:6800/3218663047
                                    <http://130.84.80.10:6800/3218663047> 
<http://130.84.80.10:6800/3218663047>
                                    2025-04-17T10:23:31.866+0200 7f4b87fff640  
0 client.86187
                                    ms_handle_reset on 
v2:130.84.80.10:6800/3218663047
                                    <http://130.84.80.10:6800/3218663047> 
<http://130.84.80.10:6800/3218663047>
                                    [
                                         {
                                    *"damage_type": "dentry",*
                                             "id": 241447932,
                                             "ino": 1,
                                             "frag": "*",
                                             "dname": "testdir2",
                                             "snap_id": "head",
                                             "path": "/testdir2"
                                         },
                                         {
                                    *"damage_type": "dentry"*,
                                             "id": 2273238993,
                                             "ino": 1,
                                             "frag": "*",
                                             "dname": "testdir1",
                                             "snap_id": "head",
                                             "path": "/testdir1"
                                         }
                                    ]
                                    [root@mon-01 ~]#

                                    Any help will be appreciated,

                                    Thanks,

                                    Christophe
                                    
_______________________________________________
                                    ceph-users mailing list --ceph-users@ceph.io
                                    To unsubscribe send an email 
toceph-users-le...@ceph.io

                    _______________________________________________
                    ceph-users mailing list --ceph-users@ceph.io
                    To unsubscribe send an email toceph-users-le...@ceph.io

            _______________________________________________
            ceph-users mailing list --ceph-users@ceph.io
            To unsubscribe send an email toceph-users-le...@ceph.io

        _______________________________________________
        ceph-users mailing list --ceph-users@ceph.io
        To unsubscribe send an email toceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to