[ceph-users] Re: HEALTH_ERR: 1 MDSs report damaged metadata - damage_type=dentry

Frédéric Nass Wed, 23 Apr 2025 03:34:23 -0700

Hi Christophe, 

Response inline


----- Le 23 Avr 25, à 11:42, Christophe DIARRA <christophe.dia...@idris.fr> a 
écrit : 

> Hello Frédéric,

> I removed the fs but haven't recreated it yet because I have a doubt about the
> health of the cluster even though it seems healthy:

> [mon-01 ~]# ceph -s
> cluster:
> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e
> health: HEALTH_OK

> services:
> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 6d)
> mgr: mon-02.mqaubn(active, since 6d), standbys: mon-03.gvywio, mon-01.xhxqdi
> osd: 368 osds: 368 up (since 16h), 368 in (since 3w)

> data:
> pools: 10 pools, 4353 pgs
> objects: 1.25M objects, 3.9 TiB
> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail
> pgs: 4353 active+clean

> I observed that listing the objects in any hdd pool will hang at the beginning
> for and empty hdd pool or after displaying the list of objects.
> I need to do a Ctrl-C to interrupt the hung 'rados ls' command. I don't have
> this problem with the pools on sdd.

> [mon-01 ~]# rados lspools
> .mgr
> pool_rbd_rep3_hdd <------ hdd pool
> pool_rbd_rep3_ssd
> rbd_ec_k6m2_hdd <------ hdd pool
> rbd_ec_k6m2_ssd
> metadata_4hddrbd_rep3_ssd
> metadata_4ssdrbd_rep3_ssd
> cfs_irods_md_test
> cfs_irods_def_test
> cfs_irods_data_test <------ hdd pool
> [mon-01 ~]# 1) Testing 'rados ls' on hdd pools:

> [mon-01 ~]# rados -p cfs_irods_data_test ls
> (hangs forever) ==> Ctrl-C

> [mon-01 ~]# rados -p pool_rbd_rep3_hdd ls|head -2
> rbd_data.565ed6699dd8.0000000000097ff6
> rbd_data.565ed6699dd8.00000000001041fb

> (then hangs forever here) ==> Ctrl-C

> [mon-01 ~]# rados -p pool_rbd_rep3_hdd ls
> rbd_data.565ed6699dd8.0000000000097ff6
> rbd_data.565ed6699dd8.00000000001041fb
> rbd_data.565ed6699dd8.000000000004f1a3
> ...
> (list truncated by me)
> ...
> rbd_data.565ed6699dd8.000000000016809e
> rbd_data.565ed6699dd8.000000000007bc05
> (then hangs forever here) ==> Ctrl-C

> 2) With the pools on ssd everything works well (the 'rados ls' commands 
> doesn't
> hang):

> [mon-01 ~]# for i in $(rados lspools|egrep 'ssd|md|def'); do echo -n "Pool $i
> :"; rados -p $i ls |wc -l; done
> Pool pool_rbd_rep3_ssd :197298
> Pool rbd_ec_k6m2_ssd :101552
> Pool metadata_4hddrbd_rep3_ssd :5
> Pool metadata_4ssdrbd_rep3_ssd :5
> Pool cfs_irods_md_test :0
> Pool cfs_irods_def_test :0

> Below is the configuration of the cluster:

> - 3 MONs (HPE DL360) + 8 OSD servers ( HPE Apol lo 4510 gen10)

> - each OSD server has 44x20TB HDD + 10x7.6TB SSD
This is dense. :-/ 

> - On each OSD server, 8 SSD are partioned and used for the wal/db of the HDD 
> OSD

> - On each OSD server 2 SSD are used for the ceph fs metadata and default data
> pools.

> Do you see any configuration problem here which could lead to our metadata
> problem ?

> Do you know what could cause the hang of the 'rados ls' command on the HDD 
> pools
> ? I would like to understand this problem before recreating an new cephfs fs.
Inaccessible PGs, misbehaving OSDs, mClock scheduler in use with 
osd_mclock_max_capacity_iops_hdd (auto)set way too low (check 'ceph config dump 
| grep osd_mclock_max_capacity_iops_hdd'). 

Since this is consecutive to an electrical maintenance (power outage?), if 
osd_mclock_max_capacity_iops_hdd is not the issue, I would restart all HDD OSDs 
one by one or node by node to have all PGs repeered. Then try the 'rados ls' 
command again. 

Regards, 
Frédéric. 

> The cluster is still is testing state so we can do any tests you could
> recommend.

> Thanks,

> Christophe
> On 22/04/2025 16:46, Christophe DIARRA wrote:

>> Hello Frédéric,

>> 15 of the 16 parallel scanning workers terminated almost immediately . But 
>> one
>> worker is still running for 1+ hour:

>> [mon-01 log]# ps -ef|grep scan
>> root 1977927 1925004 0 15:18 pts/0 00:00:00 cephfs-data-scanscan_extents
>> --filesystem cfs_irods_test --worker_n 11 --worker_m 16

>> [mon-01 log]# date;lsof -p 1977927|grep osd
>> Tue Apr 22 04:37:05 PM CEST 2025
>> cephfs-da 1977927 root 15u IPv4 7105122 0t0 TCP mon-01:34736->osd-06:6912
>> (ESTABLISHED)
>> cephfs-da 1977927 root 18u IPv4 7110774 0t0 TCP mon-01:45122->osd-03:ethoscan
>> (ESTABLISHED)
>> cephfs-da 1977927 root 19u IPv4 7105123 0t0 TCP mon-01:58556->osd-07:spg
>> (ESTABLISHED)
>> cephfs-da 1977927 root 20u IPv4 7049672 0t0 TCP mon-01:55064->osd-01:7112
>> (ESTABLISHED)
>> cephfs-da 1977927 root 21u IPv4 7082598 0t0 TCP 
>> mon-01:42120->osd-03-data:6896
>> (SYN_SENT)
>> [mon-01 log]#

>> The filesystem is empty. So I will follow your advice and remove it. After 
>> that
>> I will recreate it.

>> I will redo some proper shutdown and restart of the cluster to check if the
>> problem reappears with the newly recreated fs.

>> I will let you know.

>> Thank you for your help,

>> Christophe

>> On 22/04/2025 15:56, Frédéric Nass wrote:

>>> That, is weird for 2 reasons.

>>> The first reason is that the cephfs-data-scan should not run for a couple of
>>> hours on empty data pools. I just tried to run it on an empty pool and it
>>> doesn't run for more than maybe 10 seconds.

>>> The second reason is that the data pool cfs_irods_def_test should not be 
>>> empty,
>>> even with if the filesystem tree is. It should at least have a few rados
>>> objects named after {100,200,400,60x}.00000000 and the root inode 
>>> 1.00000000 /
>>> 1.00000000.inode unless you removed the filesystem by running the 'ceph fs 
>>> rm
>>> <filesystem_name> --yes-i-really-mean-it' command which does remove rados
>>> objects in the associated pools.

>>> If it's clear for you that this filesystem should be empty, I'd advise you 
>>> to
>>> remove it (using the 'ceph fs rm' command), delete any rados objects in the
>>> metadata and data pools, and then recreate the filesystem.

>>> Regards,
>>> Frédéric.

>>> ----- Le 22 Avr 25, à 15:13, Christophe DIARRA [
>>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] a écrit :

>>> Hello Frédéric,

>>> I have:

>>> [mon-01 ~]# rados df | grep -E
>>> 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'
>>> POOL_NAME USED OBJECTS CLONES COPIES
>>> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD
>>> WR_OPS WR USED COMPR UNDER COMPR
>>> cfs_irods_data_test 0 B 0 0 0
>>> 0 0 0 0 0 B
>>> 0 0 B 0 B 0 B
>>> cfs_irods_def_test 0 B 0 0 0
>>> 0 0 0 1 0 B
>>> 80200 157 GiB 0 B 0 B
>>> [mon-01 ~]#

>>> I will interrupt the current scanning process and rerun it with
>>> more workers.

>>> Thanks,

>>> Christophe

>>> On 22/04/2025 15:05, Frédéric Nass wrote:

>>> Hum... Obviously this 'empty' filesystem has way more rados
>>> objects in the 2 data pools than expected. You should see that
>>> many objects with:

>>> rados df | grep -E
>>> 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'

>>> If waiting is not an option, you can break the scan_extents
>>> command, re-run it with multiple workers, and then proceed
>>> with the next scan (scan_links). Just make sure you run the
>>> next scan with multiple workers as well.

>>> Regards,
>>> Frédéric.

>>> ----- Le 22 Avr 25, à 14:54, Christophe DIARRA
>>> [ mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ]
>>> [ mailto:christophe.dia...@idris.fr | <mailto:christophe.dia...@idris.fr> ] 
>>> a
>>> écrit :

>>> Hello Frédéric,

>>> I ran the commands (see below) but the command
>>> 'cephfs-data-scan scan_extents --filesystem
>>> cfs_irods_test' is not finished yet. It has been running
>>> for 2+ hours. I didn't run it in parallel because it
>>> contains empty directories only. According to [1]:
>>> "scan_extents and scan_inodes commands may take a very
>>> long time if the data pool contains many files or very
>>> large files. Now I think I should have run the command in
>>> parallel. I don't know if it is safe to interrupt it and
>>> then rerun it with 16 workers.

>>> On 22/04/2025 12:13, Frédéric Nass wrote:

>>> Hi Christophe,

>>> You could but it won't be of any help since the
>>> journal is empty. What you can do to fix the fs
>>> metadata is to run the below commands from the
>>> disaster-recovery-experts documentation [1] in this
>>> particular order:

>>> #Prevent access to the fs and set it down.
>>> ceph fs set cfs_irods_test refuse_client_session true
>>> ceph fs set cfs_irods_test joinable false
>>> ceph fs set cfs_irods_test down true

>>> [mon-01 ~]# ceph fs set cfs_irods_test
>>> refuse_client_session true
>>> client(s) blocked from establishing new session(s)

>>> [mon-01 ~]# ceph fs set cfs_irods_test joinable false
>>> cfs_irods_test marked not joinable; MDS cannot join as
>>> newly active.

>>> [mon-01 ~]# ceph fs set cfs_irods_test down true
>>> cfs_irods_test marked down.

>>> # Reset maps and journal
>>> cephfs-table-tool cfs_irods_test:0 reset session
>>> cephfs-table-tool cfs_irods_test:0 reset snap
>>> cephfs-table-tool cfs_irods_test:0 reset inode

>>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset session
>>> {
>>> "0": {
>>> "data": {},
>>> "result": 0
>>> }
>>> }

>>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset snap
>>> Error ((2) No such file or directory)
>>> 2025-04-22T12:29:09.550+0200 7f1d4c03e100 -1 main: Bad
>>> rank selection: cfs_irods_test:0'

>>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset inode
>>> Error ((2) No such file or
>>> directory2025-04-22T12:29:43.880+0200 7f0878a3a100 -1
>>> main: Bad rank selection: cfs_irods_test:0'
>>> )

>>> cephfs-journal-tool --rank cfs_irods_test:0 journal
>>> reset --force
>>> cephfs-data-scan init --force-init --filesystem
>>> cfs_irods_test

>>> [mon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0
>>> journal reset --force
>>> Error ((2) No such file or directory)
>>> 2025-04-22T12:34:42.474+0200 7fe8b3a36100 -1 main:
>>> Couldn't determine MDS rank.

>>> [mon-01 ~]# cephfs-data-scan init --force-init
>>> --filesystem cfs_irods_test
>>> [mon-01 ~]#

>>> # Rescan data and fix metadata (leaving the below
>>> commands commented for information on how to // these
>>> scan tasks)
>>> #for i in {0..15} ; do cephfs-data-scan scan_frags
>>> --filesystem cfs_irods_test --force-corrupt --worker_n
>>> $i --worker_m 16 & done
>>> #for i in {0..15} ; do cephfs-data-scan scan_extents
>>> --filesystem cfs_irods_test --worker_n $i --worker_m
>>> 16 & done
>>> #for i in {0..15} ; do cephfs-data-scan scan_inodes
>>> --filesystem cfs_irods_test --force-corrupt --worker_n
>>> $i --worker_m 16 & done
>>> #for i in {0..15} ; do cephfs-data-scan scan_links
>>> --filesystem cfs_irods_test --worker_n $i --worker_m
>>> 16 & done

>>> cephfs-data-scan scan_frags --filesystem
>>> cfs_irods_test --force-corrupt
>>> cephfs-data-scan scan_extents --filesystem cfs_irods_test

>>> [mon-01 ~]# cephfs-data-scan scan_frags --filesystem
>>> cfs_irods_test --force-corrupt
>>> [mon-01 ~]# cephfs-data-scan scan_extents --filesystem
>>> cfs_irods_test *------> still running*

>>> I don't know how long it will take. Once it will be
>>> completed I will run the remaining commands.

>>> Thanks,

>>> Christophe

>>> cephfs-data-scan scan_inodes --filesystem
>>> cfs_irods_test --force-corrupt
>>> cephfs-data-scan scan_links --filesystem cfs_irods_test
>>> cephfs-data-scan cleanup --filesystem cfs_irods_test

>>> #ceph mds repaired 0 <---- should not be necessary

>>> # Set the fs back online and accessible
>>> ceph fs set cfs_irods_test down false
>>> ceph fs set cfs_irods_test joinable true
>>> ceph fs set cfs_irods_test refuse_client_session false

>>> An MDS should now start, if not then use 'ceph orch
>>> daemon restart mds.xxxxx' to start a MDS. After
>>> remounting the fs you should be able to access
>>> /testdir1 and /testdir2 in the fs root.

>>> # scrub the fs again to check that if everything is OK.
>>> ceph tell mds.cfs_irods_test:0 scrub start /
>>> recursive,repair,force

>>> Regards,
>>> Frédéric.

>>> [1]
>>> [ https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ |
>>> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ ]

>>> ----- Le 22 Avr 25, à 10:21, Christophe DIARRA
>>> [ mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ]
>>> [ mailto:christophe.dia...@idris.fr | <mailto:christophe.dia...@idris.fr> ] 
>>> a
>>> écrit :

>>> Hello Frédéric,

>>> Thank your for your help.

>>> Following is output you asked for:

>>> [mon-01 ~]# date
>>> Tue Apr 22 10:09:10 AM CEST 2025
>>> [root@fidrcmon-01 ~]# ceph tell
>>> mds.cfs_irods_test:0 scrub start /
>>> recursive,repair,force
>>> 2025-04-22T10:09:12.796+0200 7f43f6ffd640 0
>>> client.86553 ms_handle_reset on
>>> v2:130.84.80.10:6800/3218663047
>>> 2025-04-22T10:09:12.818+0200 7f43f6ffd640 0
>>> client.86559 ms_handle_reset on
>>> v2:130.84.80.10:6800/3218663047
>>> {
>>> "return_code": 0,
>>> "scrub_tag":
>>> "12e537bb-bb39-4f3b-ae09-e0a1ae6ce906",
>>> "mode": "asynchronous"
>>> }
>>> [root@fidrcmon-01 ~]# ceph tell
>>> mds.cfs_irods_test:0 scrub status
>>> 2025-04-22T10:09:31.760+0200 7f3f0f7fe640 0
>>> client.86571 ms_handle_reset on
>>> v2:130.84.80.10:6800/3218663047
>>> 2025-04-22T10:09:31.781+0200 7f3f0f7fe640 0
>>> client.86577 ms_handle_reset on
>>> v2:130.84.80.10:6800/3218663047
>>> {
>>> "status": "no active scrubs running",
>>> "scrubs": {}
>>> }
>>> [root@fidrcmon-01 ~]# cephfs-journal-tool --rank
>>> cfs_irods_test:0 event recover_dentries list
>>> 2025-04-16T18:24:56.802960+0200 0x7c334a
>>> SUBTREEMAP: ()
>>> [root@fidrcmon-01 ~]#

>>> Based on this output, can I run the other three
>>> commands provided in your message :

>>> ceph tell mds.0 flush journal
>>> ceph mds fail 0
>>> ceph tell mds.cfs_irods_test:0 scrub start / recursive

>>> Thanks,

>>> Christophe

>>> On 19/04/2025 12:55, Frédéric Nass wrote:

>>> Hi Christophe, Hi David,

>>> Could you share the ouptut of the below command after running the scrubbing 
>>> with
>>> recursive,repair,force?

>>> cephfs-journal-tool --rank cfs_irods_test:0 event recover_dentries list

>>> Could be that the MDS recovered these 2 dentries in its journal already but 
>>> the
>>> status of the filesystem was not updated yet. I've seen this happening 
>>> before.
>>> If that the case, you could try a flush, fail and re-scrub:

>>> ceph tell mds.0 flush journal
>>> ceph mds fail 0
>>> ceph tell mds.cfs_irods_test:0 scrub start / recursive

>>> This might clear the HEALTH_ERR. If not, then it will be easy to fix by
>>> rebuilding / fixing the metadata from the data pools since this fs is empty.

>>> Let us know,

>>> Regards,
>>> Frédéric.

>>> ----- Le 18 Avr 25, à 9:51, [ mailto:daviddavid.cas...@aevoo.fr |
>>> daviddavid.cas...@aevoo.fr ] a écrit :

>>> I also tend to think that the disk has nothing to do with the problem.

>>> My reading is that the inode associated with the dentry is missing.
>>> Can anyone correct me?

>>> Christophe informed me that the directories were emptied before the
>>> incident.

>>> I don't understand why scrubbing doesn't repair the meta data.
>>> Perhaps because the directory is empty ?

>>> Le jeu. 17 avr. 2025 à 19:06, Anthony D'Atri [ 
>>> mailto:anthony.da...@gmail.com |
>>> <anthony.da...@gmail.com> ] [ mailto:anthony.da...@gmail.com |
>>> <mailto:anthony.da...@gmail.com> ] a
>>> écrit :

>>> HPE rebadges drives from manufacturers. A quick search supports the idea
>>> that this SKU is fulfilled at least partly by Kioxia, so not likely a PLP
>>> issue.

>>> On Apr 17, 2025, at 11:39 AM, Christophe DIARRA <

>>> [ mailto:christophe.dia...@idris.fr | christophe.dia...@idris.fr ] > wrote:

>>> Hello David,

>>> The SSD model is VO007680JWZJL.

>>> I will delay the 'ceph tell mds.cfs_irods_test:0 damage rm 241447932'

>>> for the moment. If any other solution is found I will be obliged to use
>>> this command.

>>> I found 'dentry' in the logs when the cephfs cluster started:

>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.cfs_irods_test.mon-02.awuygq

>>> Updating MDS map to version 15613 from mon.2

>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 handle_mds_map i am

>>> now mds.0.15612

>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 handle_mds_map state

>>> change up:starting --> up:active

>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 active_start
>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.cache.den(0x1 testdir2)

>>> loaded already *corrupt dentry*: [dentry #0x1/testdir2 [2, [ 
>>> mailto:head]rep@0.0
>>> | head]rep@0.0 ]
>>> NULL (dversion lock) pv=0 v=4442 ino=(n

>>> il) state=0 0x5617e18c8280]
>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.cache.den(0x1 testdir1)

>>> loaded already *corrupt dentry*: [dentry #0x1/testdir1 [2, [ 
>>> mailto:head]rep@0.0
>>> | head]rep@0.0 ]
>>> NULL (dversion lock) pv=0 v=4442 ino=(n

>>> il) state=0 0x5617e18c8500]
>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check failed: 1

>>> filesystem is offline (MDS_ALL_DOWN)

>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check failed: 1

>>> filesystem is online with fewer MDS than max_mds (MDS_UP_LESS_THAN_MAX)

>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: from='client.?

>>> xx.xx.xx.8:0/3820885518' entity='client.admin' cmd='[{"prefix": "fs set",
>>> "fs_name": "cfs_irods_test", "var": "down", "val":

>>> "false"}]': finished
>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon

>>> mds.cfs_irods_test.mon-02.awuygq assigned to filesystem cfs_irods_test as
>>> rank 0 (now has 1 ranks)

>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check cleared:

>>> MDS_ALL_DOWN (was: 1 filesystem is offline)

>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check cleared:

>>> MDS_UP_LESS_THAN_MAX (was: 1 filesystem is online with fewer MDS than
>>> max_mds)

>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon

>>> mds.cfs_irods_test.mon-02.awuygq is now active in filesystem cfs_irods_test
>>> as rank 0

>>> Apr 16 17:29:54 mon-02 ceph-mgr[2444]: log_channel(cluster) log [DBG] :

>>> pgmap v1721: 4353 pgs: 4346 active+clean, 7 active+clean+scrubbing+deep;
>>> 3.9 TiB data, 417 TiB used, 6.4 P

>>> iB / 6.8 PiB avail; 1.4 KiB/s rd, 1 op/s

>>> If you need more extract from the log file please let me know.

>>> Thanks for your help,

>>> Christophe

>>> On 17/04/2025 13:39, David C. wrote:

>>> If I'm not mistaken, this is a fairly rare situation.

>>> The fact that it's the result of a power outage makes me think of a bad

>>> SSD (like "S... Pro").

>>> Does a grep of the dentry id in the MDS logs return anything?
>>> Maybe some interesting information around this grep

>>> In the heat of the moment, I have no other idea than to delete the

>>> dentry.

>>> ceph tell mds.cfs_irods_test:0 damage rm 241447932

>>> However, in production, this results in the content (of dir

>>> /testdir[12]) being abandoned.

>>> Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA <

>>> [ mailto:christophe.dia...@idris.fr | christophe.dia...@idris.fr ] > a 
>>> écrit :

>>> Hello David,

>>> Thank you for the tip about the scrubbing. I have tried the
>>> commands found in the documentation but it seems to have no effect:

>>> [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub start /

>>> recursive,repair,force*

>>> 2025-04-17T12:07:20.958+0200 7fd4157fa640 0 client.86301

>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>> 2025-04-17T12:07:20.979+0200<
>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 |
>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 ] > [
>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 |
>>> <http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> ]
>>> 7fd4157fa640 0 client.86307 ms_handle_reset on v2:
>>> 130.84.80.10:6800/3218663047 [ http://130.84.80.10:6800/3218663047 |
>>> <http://130.84.80.10:6800/3218663047> ] [ 
>>> http://130.84.80.10:6800/3218663047 |
>>> <http://130.84.80.10:6800/3218663047> ]

>>> {
>>> "return_code": 0,
>>> "scrub_tag": "733b1c6d-a418-4c83-bc8e-b28b556e970c",
>>> "mode": "asynchronous"
>>> }

>>> [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub status*
>>> 2025-04-17T12:07:30.734+0200 7f26cdffb640 0 client.86319

>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>> 2025-04-17T12:07:30.753+0200<
>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 |
>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 ] > [
>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 |
>>> <http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> ]
>>> 7f26cdffb640 0 client.86325 ms_handle_reset on v2:
>>> 130.84.80.10:6800/3218663047 [ http://130.84.80.10:6800/3218663047 |
>>> <http://130.84.80.10:6800/3218663047> ] [ 
>>> http://130.84.80.10:6800/3218663047 |
>>> <http://130.84.80.10:6800/3218663047> ]

>>> {
>>> "status": "no active scrubs running",
>>> "scrubs": {}
>>> }
>>> [root@mon-01 ~]# ceph -s
>>> cluster:
>>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>>> *health: HEALTH_ERR 1 MDSs report damaged metadata*
>>> services:
>>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 19h)
>>> mgr: mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio,

>>> mon-01.xhxqdi

>>> mds: 1/1 daemons up, 2 standby
>>> osd: 368 osds: 368 up (since 18h), 368 in (since 3w)
>>> data:
>>> volumes: 1/1 healthy
>>> pools: 10 pools, 4353 pgs
>>> objects: 1.25M objects, 3.9 TiB
>>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail
>>> pgs: 4353 active+clean

>>> Did I miss something ?

>>> The server didn't crash. I don't understand what you are meaning
>>> by "there may be a design flaw in the infrastructure (insecure
>>> cache, for example)".
>>> How to know if we have a design problem ? What should we check ?

>>> Best regards,

>>> Christophe

>>> On 17/04/2025 11:07, David C. wrote:

>>> Hello Christophe,

>>> Check the file system scrubbing procedure =>
>>> [ https://docs.ceph.com/en/latest/cephfs/scrub/ |
>>> https://docs.ceph.com/en/latest/cephfs/scrub/ ] But this doesn't
>>> guarantee data recovery.

>>> Was the cluster crashed?
>>> Ceph should be able to handle it; there may be a design flaw in
>>> the infrastructure (insecure cache, for example).

>>> David

>>> Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA
>>> [ mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] [
>>> mailto:christophe.dia...@idris.fr | <mailto:christophe.dia...@idris.fr> ] a
>>> écrit :

>>> Hello,

>>> After an electrical maintenance I restarted our ceph cluster
>>> but it
>>> remains in an unhealthy state: HEALTH_ERR 1 MDSs report
>>> damaged metadata.

>>> How to repair this damaged metadata ?

>>> To bring down the cephfs cluster I unmounted the fs from the
>>> client
>>> first and then did: ceph fs set cfs_irods_test down true

>>> To bring up the cephfs cluster I did: ceph fs set
>>> cfs_irods_test down false

>>> Fortunately the cfs_irods_test fs is almost empty and is a fs
>>> for
>>> tests.The ceph cluster is not in production yet.

>>> Following is the current status:

>>> [root@mon-01 ~]# ceph health detail
>>> HEALTH_ERR 1 MDSs report damaged metadata
>>> *[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata
>>> mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage
>>> detected*

>>> [root@mon-01 ~]# ceph -s
>>> cluster:
>>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>>> health: HEALTH_ERR
>>> 1 MDSs report damaged metadata

>>> services:
>>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 17h)
>>> mgr: mon-02.mqaubn(active, since 17h), standbys:
>>> mon-03.gvywio,
>>> mon-01.xhxqdi
>>> mds: 1/1 daemons up, 2 standby
>>> osd: 368 osds: 368 up (since 17h), 368 in (since 3w)

>>> data:
>>> volumes: 1/1 healthy
>>> pools: 10 pools, 4353 pgs
>>> objects: 1.25M objects, 3.9 TiB
>>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail
>>> pgs: 4353 active+clean

>>> [root@mon-01 ~]# ceph fs ls
>>> name: cfs_irods_test, metadata pool: cfs_irods_md_test, data
>>> pools:
>>> [cfs_irods_def_test cfs_irods_data_test ]

>>> [root@mon-01 ~]# ceph mds stat
>>> cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2
>>> up:standby

>>> [root@mon-01 ~]# ceph fs status
>>> cfs_irods_test - 0 clients
>>> ==============
>>> RANK STATE MDS ACTIVITY DNS
>>> INOS DIRS CAPS
>>> 0 active cfs_irods_test.mon-03.vlmeuz Reqs: 0 /s
>>> 12 15
>>> 14 0
>>> POOL TYPE USED AVAIL
>>> cfs_irods_md_test metadata 11.4M 34.4T
>>> cfs_irods_def_test data 0 34.4T
>>> cfs_irods_data_test data 0 4542T
>>> STANDBY MDS
>>> cfs_irods_test.mon-01.hitdem
>>> cfs_irods_test.mon-02.awuygq
>>> MDS version: ceph version 18.2.2
>>> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>>> [root@mon-01 ~]#

>>> [root@mon-01 ~]# ceph tell mds.cfs_irods_test:0 damage ls
>>> 2025-04-17T10:23:31.849+0200 7f4b87fff640 0 client.86181
>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>> [ http://130.84.80.10:6800/3218663047 | 
>>> <http://130.84.80.10:6800/3218663047> ]
>>> [ http://130.84.80.10:6800/3218663047 | 
>>> <http://130.84.80.10:6800/3218663047> ]
>>> 2025-04-17T10:23:31.866+0200 7f4b87fff640 0 client.86187
>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>> [ http://130.84.80.10:6800/3218663047 | 
>>> <http://130.84.80.10:6800/3218663047> ]
>>> [ http://130.84.80.10:6800/3218663047 | 
>>> <http://130.84.80.10:6800/3218663047> ]
>>> [
>>> {
>>> *"damage_type": "dentry",*
>>> "id": 241447932,
>>> "ino": 1,
>>> "frag": "*",
>>> "dname": "testdir2",
>>> "snap_id": "head",
>>> "path": "/testdir2"
>>> },
>>> {
>>> *"damage_type": "dentry"*,
>>> "id": 2273238993,
>>> "ino": 1,
>>> "frag": "*",
>>> "dname": "testdir1",
>>> "snap_id": "head",
>>> "path": "/testdir1"
>>> }
>>> ]
>>> [root@mon-01 ~]#

>>> Any help will be appreciated,

>>> Thanks,

>>> Christophe
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@ceph.io
>>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io |
>>> toceph-users-le...@ceph.io ]

>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@ceph.io
>>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io |
>>> toceph-users-le...@ceph.io ]

>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@ceph.io
>>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io |
>>> toceph-users-le...@ceph.io ]

>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@ceph.io
>>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io |
>>> toceph-users-le...@ceph.io ]

>> _______________________________________________
>> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ]
>> To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io |
>> ceph-users-le...@ceph.io ]
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: HEALTH_ERR: 1 MDSs report damaged metadata - damage_type=dentry

Reply via email to