Hi Christophe, Response inline
----- Le 23 Avr 25, à 11:42, Christophe DIARRA <christophe.dia...@idris.fr> a écrit : > Hello Frédéric, > I removed the fs but haven't recreated it yet because I have a doubt about the > health of the cluster even though it seems healthy: > [mon-01 ~]# ceph -s > cluster: > id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e > health: HEALTH_OK > services: > mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 6d) > mgr: mon-02.mqaubn(active, since 6d), standbys: mon-03.gvywio, mon-01.xhxqdi > osd: 368 osds: 368 up (since 16h), 368 in (since 3w) > data: > pools: 10 pools, 4353 pgs > objects: 1.25M objects, 3.9 TiB > usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail > pgs: 4353 active+clean > I observed that listing the objects in any hdd pool will hang at the beginning > for and empty hdd pool or after displaying the list of objects. > I need to do a Ctrl-C to interrupt the hung 'rados ls' command. I don't have > this problem with the pools on sdd. > [mon-01 ~]# rados lspools > .mgr > pool_rbd_rep3_hdd <------ hdd pool > pool_rbd_rep3_ssd > rbd_ec_k6m2_hdd <------ hdd pool > rbd_ec_k6m2_ssd > metadata_4hddrbd_rep3_ssd > metadata_4ssdrbd_rep3_ssd > cfs_irods_md_test > cfs_irods_def_test > cfs_irods_data_test <------ hdd pool > [mon-01 ~]# 1) Testing 'rados ls' on hdd pools: > [mon-01 ~]# rados -p cfs_irods_data_test ls > (hangs forever) ==> Ctrl-C > [mon-01 ~]# rados -p pool_rbd_rep3_hdd ls|head -2 > rbd_data.565ed6699dd8.0000000000097ff6 > rbd_data.565ed6699dd8.00000000001041fb > (then hangs forever here) ==> Ctrl-C > [mon-01 ~]# rados -p pool_rbd_rep3_hdd ls > rbd_data.565ed6699dd8.0000000000097ff6 > rbd_data.565ed6699dd8.00000000001041fb > rbd_data.565ed6699dd8.000000000004f1a3 > ... > (list truncated by me) > ... > rbd_data.565ed6699dd8.000000000016809e > rbd_data.565ed6699dd8.000000000007bc05 > (then hangs forever here) ==> Ctrl-C > 2) With the pools on ssd everything works well (the 'rados ls' commands > doesn't > hang): > [mon-01 ~]# for i in $(rados lspools|egrep 'ssd|md|def'); do echo -n "Pool $i > :"; rados -p $i ls |wc -l; done > Pool pool_rbd_rep3_ssd :197298 > Pool rbd_ec_k6m2_ssd :101552 > Pool metadata_4hddrbd_rep3_ssd :5 > Pool metadata_4ssdrbd_rep3_ssd :5 > Pool cfs_irods_md_test :0 > Pool cfs_irods_def_test :0 > Below is the configuration of the cluster: > - 3 MONs (HPE DL360) + 8 OSD servers ( HPE Apol lo 4510 gen10) > - each OSD server has 44x20TB HDD + 10x7.6TB SSD This is dense. :-/ > - On each OSD server, 8 SSD are partioned and used for the wal/db of the HDD > OSD > - On each OSD server 2 SSD are used for the ceph fs metadata and default data > pools. > Do you see any configuration problem here which could lead to our metadata > problem ? > Do you know what could cause the hang of the 'rados ls' command on the HDD > pools > ? I would like to understand this problem before recreating an new cephfs fs. Inaccessible PGs, misbehaving OSDs, mClock scheduler in use with osd_mclock_max_capacity_iops_hdd (auto)set way too low (check 'ceph config dump | grep osd_mclock_max_capacity_iops_hdd'). Since this is consecutive to an electrical maintenance (power outage?), if osd_mclock_max_capacity_iops_hdd is not the issue, I would restart all HDD OSDs one by one or node by node to have all PGs repeered. Then try the 'rados ls' command again. Regards, Frédéric. > The cluster is still is testing state so we can do any tests you could > recommend. > Thanks, > Christophe > On 22/04/2025 16:46, Christophe DIARRA wrote: >> Hello Frédéric, >> 15 of the 16 parallel scanning workers terminated almost immediately . But >> one >> worker is still running for 1+ hour: >> [mon-01 log]# ps -ef|grep scan >> root 1977927 1925004 0 15:18 pts/0 00:00:00 cephfs-data-scanscan_extents >> --filesystem cfs_irods_test --worker_n 11 --worker_m 16 >> [mon-01 log]# date;lsof -p 1977927|grep osd >> Tue Apr 22 04:37:05 PM CEST 2025 >> cephfs-da 1977927 root 15u IPv4 7105122 0t0 TCP mon-01:34736->osd-06:6912 >> (ESTABLISHED) >> cephfs-da 1977927 root 18u IPv4 7110774 0t0 TCP mon-01:45122->osd-03:ethoscan >> (ESTABLISHED) >> cephfs-da 1977927 root 19u IPv4 7105123 0t0 TCP mon-01:58556->osd-07:spg >> (ESTABLISHED) >> cephfs-da 1977927 root 20u IPv4 7049672 0t0 TCP mon-01:55064->osd-01:7112 >> (ESTABLISHED) >> cephfs-da 1977927 root 21u IPv4 7082598 0t0 TCP >> mon-01:42120->osd-03-data:6896 >> (SYN_SENT) >> [mon-01 log]# >> The filesystem is empty. So I will follow your advice and remove it. After >> that >> I will recreate it. >> I will redo some proper shutdown and restart of the cluster to check if the >> problem reappears with the newly recreated fs. >> I will let you know. >> Thank you for your help, >> Christophe >> On 22/04/2025 15:56, Frédéric Nass wrote: >>> That, is weird for 2 reasons. >>> The first reason is that the cephfs-data-scan should not run for a couple of >>> hours on empty data pools. I just tried to run it on an empty pool and it >>> doesn't run for more than maybe 10 seconds. >>> The second reason is that the data pool cfs_irods_def_test should not be >>> empty, >>> even with if the filesystem tree is. It should at least have a few rados >>> objects named after {100,200,400,60x}.00000000 and the root inode >>> 1.00000000 / >>> 1.00000000.inode unless you removed the filesystem by running the 'ceph fs >>> rm >>> <filesystem_name> --yes-i-really-mean-it' command which does remove rados >>> objects in the associated pools. >>> If it's clear for you that this filesystem should be empty, I'd advise you >>> to >>> remove it (using the 'ceph fs rm' command), delete any rados objects in the >>> metadata and data pools, and then recreate the filesystem. >>> Regards, >>> Frédéric. >>> ----- Le 22 Avr 25, à 15:13, Christophe DIARRA [ >>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] a écrit : >>> Hello Frédéric, >>> I have: >>> [mon-01 ~]# rados df | grep -E >>> 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test' >>> POOL_NAME USED OBJECTS CLONES COPIES >>> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD >>> WR_OPS WR USED COMPR UNDER COMPR >>> cfs_irods_data_test 0 B 0 0 0 >>> 0 0 0 0 0 B >>> 0 0 B 0 B 0 B >>> cfs_irods_def_test 0 B 0 0 0 >>> 0 0 0 1 0 B >>> 80200 157 GiB 0 B 0 B >>> [mon-01 ~]# >>> I will interrupt the current scanning process and rerun it with >>> more workers. >>> Thanks, >>> Christophe >>> On 22/04/2025 15:05, Frédéric Nass wrote: >>> Hum... Obviously this 'empty' filesystem has way more rados >>> objects in the 2 data pools than expected. You should see that >>> many objects with: >>> rados df | grep -E >>> 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test' >>> If waiting is not an option, you can break the scan_extents >>> command, re-run it with multiple workers, and then proceed >>> with the next scan (scan_links). Just make sure you run the >>> next scan with multiple workers as well. >>> Regards, >>> Frédéric. >>> ----- Le 22 Avr 25, à 14:54, Christophe DIARRA >>> [ mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] >>> [ mailto:christophe.dia...@idris.fr | <mailto:christophe.dia...@idris.fr> ] >>> a >>> écrit : >>> Hello Frédéric, >>> I ran the commands (see below) but the command >>> 'cephfs-data-scan scan_extents --filesystem >>> cfs_irods_test' is not finished yet. It has been running >>> for 2+ hours. I didn't run it in parallel because it >>> contains empty directories only. According to [1]: >>> "scan_extents and scan_inodes commands may take a very >>> long time if the data pool contains many files or very >>> large files. Now I think I should have run the command in >>> parallel. I don't know if it is safe to interrupt it and >>> then rerun it with 16 workers. >>> On 22/04/2025 12:13, Frédéric Nass wrote: >>> Hi Christophe, >>> You could but it won't be of any help since the >>> journal is empty. What you can do to fix the fs >>> metadata is to run the below commands from the >>> disaster-recovery-experts documentation [1] in this >>> particular order: >>> #Prevent access to the fs and set it down. >>> ceph fs set cfs_irods_test refuse_client_session true >>> ceph fs set cfs_irods_test joinable false >>> ceph fs set cfs_irods_test down true >>> [mon-01 ~]# ceph fs set cfs_irods_test >>> refuse_client_session true >>> client(s) blocked from establishing new session(s) >>> [mon-01 ~]# ceph fs set cfs_irods_test joinable false >>> cfs_irods_test marked not joinable; MDS cannot join as >>> newly active. >>> [mon-01 ~]# ceph fs set cfs_irods_test down true >>> cfs_irods_test marked down. >>> # Reset maps and journal >>> cephfs-table-tool cfs_irods_test:0 reset session >>> cephfs-table-tool cfs_irods_test:0 reset snap >>> cephfs-table-tool cfs_irods_test:0 reset inode >>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset session >>> { >>> "0": { >>> "data": {}, >>> "result": 0 >>> } >>> } >>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset snap >>> Error ((2) No such file or directory) >>> 2025-04-22T12:29:09.550+0200 7f1d4c03e100 -1 main: Bad >>> rank selection: cfs_irods_test:0' >>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset inode >>> Error ((2) No such file or >>> directory2025-04-22T12:29:43.880+0200 7f0878a3a100 -1 >>> main: Bad rank selection: cfs_irods_test:0' >>> ) >>> cephfs-journal-tool --rank cfs_irods_test:0 journal >>> reset --force >>> cephfs-data-scan init --force-init --filesystem >>> cfs_irods_test >>> [mon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0 >>> journal reset --force >>> Error ((2) No such file or directory) >>> 2025-04-22T12:34:42.474+0200 7fe8b3a36100 -1 main: >>> Couldn't determine MDS rank. >>> [mon-01 ~]# cephfs-data-scan init --force-init >>> --filesystem cfs_irods_test >>> [mon-01 ~]# >>> # Rescan data and fix metadata (leaving the below >>> commands commented for information on how to // these >>> scan tasks) >>> #for i in {0..15} ; do cephfs-data-scan scan_frags >>> --filesystem cfs_irods_test --force-corrupt --worker_n >>> $i --worker_m 16 & done >>> #for i in {0..15} ; do cephfs-data-scan scan_extents >>> --filesystem cfs_irods_test --worker_n $i --worker_m >>> 16 & done >>> #for i in {0..15} ; do cephfs-data-scan scan_inodes >>> --filesystem cfs_irods_test --force-corrupt --worker_n >>> $i --worker_m 16 & done >>> #for i in {0..15} ; do cephfs-data-scan scan_links >>> --filesystem cfs_irods_test --worker_n $i --worker_m >>> 16 & done >>> cephfs-data-scan scan_frags --filesystem >>> cfs_irods_test --force-corrupt >>> cephfs-data-scan scan_extents --filesystem cfs_irods_test >>> [mon-01 ~]# cephfs-data-scan scan_frags --filesystem >>> cfs_irods_test --force-corrupt >>> [mon-01 ~]# cephfs-data-scan scan_extents --filesystem >>> cfs_irods_test *------> still running* >>> I don't know how long it will take. Once it will be >>> completed I will run the remaining commands. >>> Thanks, >>> Christophe >>> cephfs-data-scan scan_inodes --filesystem >>> cfs_irods_test --force-corrupt >>> cephfs-data-scan scan_links --filesystem cfs_irods_test >>> cephfs-data-scan cleanup --filesystem cfs_irods_test >>> #ceph mds repaired 0 <---- should not be necessary >>> # Set the fs back online and accessible >>> ceph fs set cfs_irods_test down false >>> ceph fs set cfs_irods_test joinable true >>> ceph fs set cfs_irods_test refuse_client_session false >>> An MDS should now start, if not then use 'ceph orch >>> daemon restart mds.xxxxx' to start a MDS. After >>> remounting the fs you should be able to access >>> /testdir1 and /testdir2 in the fs root. >>> # scrub the fs again to check that if everything is OK. >>> ceph tell mds.cfs_irods_test:0 scrub start / >>> recursive,repair,force >>> Regards, >>> Frédéric. >>> [1] >>> [ https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ | >>> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ ] >>> ----- Le 22 Avr 25, à 10:21, Christophe DIARRA >>> [ mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] >>> [ mailto:christophe.dia...@idris.fr | <mailto:christophe.dia...@idris.fr> ] >>> a >>> écrit : >>> Hello Frédéric, >>> Thank your for your help. >>> Following is output you asked for: >>> [mon-01 ~]# date >>> Tue Apr 22 10:09:10 AM CEST 2025 >>> [root@fidrcmon-01 ~]# ceph tell >>> mds.cfs_irods_test:0 scrub start / >>> recursive,repair,force >>> 2025-04-22T10:09:12.796+0200 7f43f6ffd640 0 >>> client.86553 ms_handle_reset on >>> v2:130.84.80.10:6800/3218663047 >>> 2025-04-22T10:09:12.818+0200 7f43f6ffd640 0 >>> client.86559 ms_handle_reset on >>> v2:130.84.80.10:6800/3218663047 >>> { >>> "return_code": 0, >>> "scrub_tag": >>> "12e537bb-bb39-4f3b-ae09-e0a1ae6ce906", >>> "mode": "asynchronous" >>> } >>> [root@fidrcmon-01 ~]# ceph tell >>> mds.cfs_irods_test:0 scrub status >>> 2025-04-22T10:09:31.760+0200 7f3f0f7fe640 0 >>> client.86571 ms_handle_reset on >>> v2:130.84.80.10:6800/3218663047 >>> 2025-04-22T10:09:31.781+0200 7f3f0f7fe640 0 >>> client.86577 ms_handle_reset on >>> v2:130.84.80.10:6800/3218663047 >>> { >>> "status": "no active scrubs running", >>> "scrubs": {} >>> } >>> [root@fidrcmon-01 ~]# cephfs-journal-tool --rank >>> cfs_irods_test:0 event recover_dentries list >>> 2025-04-16T18:24:56.802960+0200 0x7c334a >>> SUBTREEMAP: () >>> [root@fidrcmon-01 ~]# >>> Based on this output, can I run the other three >>> commands provided in your message : >>> ceph tell mds.0 flush journal >>> ceph mds fail 0 >>> ceph tell mds.cfs_irods_test:0 scrub start / recursive >>> Thanks, >>> Christophe >>> On 19/04/2025 12:55, Frédéric Nass wrote: >>> Hi Christophe, Hi David, >>> Could you share the ouptut of the below command after running the scrubbing >>> with >>> recursive,repair,force? >>> cephfs-journal-tool --rank cfs_irods_test:0 event recover_dentries list >>> Could be that the MDS recovered these 2 dentries in its journal already but >>> the >>> status of the filesystem was not updated yet. I've seen this happening >>> before. >>> If that the case, you could try a flush, fail and re-scrub: >>> ceph tell mds.0 flush journal >>> ceph mds fail 0 >>> ceph tell mds.cfs_irods_test:0 scrub start / recursive >>> This might clear the HEALTH_ERR. If not, then it will be easy to fix by >>> rebuilding / fixing the metadata from the data pools since this fs is empty. >>> Let us know, >>> Regards, >>> Frédéric. >>> ----- Le 18 Avr 25, à 9:51, [ mailto:daviddavid.cas...@aevoo.fr | >>> daviddavid.cas...@aevoo.fr ] a écrit : >>> I also tend to think that the disk has nothing to do with the problem. >>> My reading is that the inode associated with the dentry is missing. >>> Can anyone correct me? >>> Christophe informed me that the directories were emptied before the >>> incident. >>> I don't understand why scrubbing doesn't repair the meta data. >>> Perhaps because the directory is empty ? >>> Le jeu. 17 avr. 2025 à 19:06, Anthony D'Atri [ >>> mailto:anthony.da...@gmail.com | >>> <anthony.da...@gmail.com> ] [ mailto:anthony.da...@gmail.com | >>> <mailto:anthony.da...@gmail.com> ] a >>> écrit : >>> HPE rebadges drives from manufacturers. A quick search supports the idea >>> that this SKU is fulfilled at least partly by Kioxia, so not likely a PLP >>> issue. >>> On Apr 17, 2025, at 11:39 AM, Christophe DIARRA < >>> [ mailto:christophe.dia...@idris.fr | christophe.dia...@idris.fr ] > wrote: >>> Hello David, >>> The SSD model is VO007680JWZJL. >>> I will delay the 'ceph tell mds.cfs_irods_test:0 damage rm 241447932' >>> for the moment. If any other solution is found I will be obliged to use >>> this command. >>> I found 'dentry' in the logs when the cephfs cluster started: >>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.cfs_irods_test.mon-02.awuygq >>> Updating MDS map to version 15613 from mon.2 >>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 handle_mds_map i am >>> now mds.0.15612 >>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 handle_mds_map state >>> change up:starting --> up:active >>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 active_start >>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.cache.den(0x1 testdir2) >>> loaded already *corrupt dentry*: [dentry #0x1/testdir2 [2, [ >>> mailto:head]rep@0.0 >>> | head]rep@0.0 ] >>> NULL (dversion lock) pv=0 v=4442 ino=(n >>> il) state=0 0x5617e18c8280] >>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.cache.den(0x1 testdir1) >>> loaded already *corrupt dentry*: [dentry #0x1/testdir1 [2, [ >>> mailto:head]rep@0.0 >>> | head]rep@0.0 ] >>> NULL (dversion lock) pv=0 v=4442 ino=(n >>> il) state=0 0x5617e18c8500] >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check failed: 1 >>> filesystem is offline (MDS_ALL_DOWN) >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check failed: 1 >>> filesystem is online with fewer MDS than max_mds (MDS_UP_LESS_THAN_MAX) >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: from='client.? >>> xx.xx.xx.8:0/3820885518' entity='client.admin' cmd='[{"prefix": "fs set", >>> "fs_name": "cfs_irods_test", "var": "down", "val": >>> "false"}]': finished >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon >>> mds.cfs_irods_test.mon-02.awuygq assigned to filesystem cfs_irods_test as >>> rank 0 (now has 1 ranks) >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check cleared: >>> MDS_ALL_DOWN (was: 1 filesystem is offline) >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check cleared: >>> MDS_UP_LESS_THAN_MAX (was: 1 filesystem is online with fewer MDS than >>> max_mds) >>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon >>> mds.cfs_irods_test.mon-02.awuygq is now active in filesystem cfs_irods_test >>> as rank 0 >>> Apr 16 17:29:54 mon-02 ceph-mgr[2444]: log_channel(cluster) log [DBG] : >>> pgmap v1721: 4353 pgs: 4346 active+clean, 7 active+clean+scrubbing+deep; >>> 3.9 TiB data, 417 TiB used, 6.4 P >>> iB / 6.8 PiB avail; 1.4 KiB/s rd, 1 op/s >>> If you need more extract from the log file please let me know. >>> Thanks for your help, >>> Christophe >>> On 17/04/2025 13:39, David C. wrote: >>> If I'm not mistaken, this is a fairly rare situation. >>> The fact that it's the result of a power outage makes me think of a bad >>> SSD (like "S... Pro"). >>> Does a grep of the dentry id in the MDS logs return anything? >>> Maybe some interesting information around this grep >>> In the heat of the moment, I have no other idea than to delete the >>> dentry. >>> ceph tell mds.cfs_irods_test:0 damage rm 241447932 >>> However, in production, this results in the content (of dir >>> /testdir[12]) being abandoned. >>> Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA < >>> [ mailto:christophe.dia...@idris.fr | christophe.dia...@idris.fr ] > a >>> écrit : >>> Hello David, >>> Thank you for the tip about the scrubbing. I have tried the >>> commands found in the documentation but it seems to have no effect: >>> [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub start / >>> recursive,repair,force* >>> 2025-04-17T12:07:20.958+0200 7fd4157fa640 0 client.86301 >>> ms_handle_reset on v2:130.84.80.10:6800/3218663047 >>> 2025-04-17T12:07:20.979+0200< >>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 | >>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 ] > [ >>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 | >>> <http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> ] >>> 7fd4157fa640 0 client.86307 ms_handle_reset on v2: >>> 130.84.80.10:6800/3218663047 [ http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] [ >>> http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] >>> { >>> "return_code": 0, >>> "scrub_tag": "733b1c6d-a418-4c83-bc8e-b28b556e970c", >>> "mode": "asynchronous" >>> } >>> [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub status* >>> 2025-04-17T12:07:30.734+0200 7f26cdffb640 0 client.86319 >>> ms_handle_reset on v2:130.84.80.10:6800/3218663047 >>> 2025-04-17T12:07:30.753+0200< >>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 | >>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 ] > [ >>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 | >>> <http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> ] >>> 7f26cdffb640 0 client.86325 ms_handle_reset on v2: >>> 130.84.80.10:6800/3218663047 [ http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] [ >>> http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] >>> { >>> "status": "no active scrubs running", >>> "scrubs": {} >>> } >>> [root@mon-01 ~]# ceph -s >>> cluster: >>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e >>> *health: HEALTH_ERR 1 MDSs report damaged metadata* >>> services: >>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 19h) >>> mgr: mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio, >>> mon-01.xhxqdi >>> mds: 1/1 daemons up, 2 standby >>> osd: 368 osds: 368 up (since 18h), 368 in (since 3w) >>> data: >>> volumes: 1/1 healthy >>> pools: 10 pools, 4353 pgs >>> objects: 1.25M objects, 3.9 TiB >>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail >>> pgs: 4353 active+clean >>> Did I miss something ? >>> The server didn't crash. I don't understand what you are meaning >>> by "there may be a design flaw in the infrastructure (insecure >>> cache, for example)". >>> How to know if we have a design problem ? What should we check ? >>> Best regards, >>> Christophe >>> On 17/04/2025 11:07, David C. wrote: >>> Hello Christophe, >>> Check the file system scrubbing procedure => >>> [ https://docs.ceph.com/en/latest/cephfs/scrub/ | >>> https://docs.ceph.com/en/latest/cephfs/scrub/ ] But this doesn't >>> guarantee data recovery. >>> Was the cluster crashed? >>> Ceph should be able to handle it; there may be a design flaw in >>> the infrastructure (insecure cache, for example). >>> David >>> Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA >>> [ mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] [ >>> mailto:christophe.dia...@idris.fr | <mailto:christophe.dia...@idris.fr> ] a >>> écrit : >>> Hello, >>> After an electrical maintenance I restarted our ceph cluster >>> but it >>> remains in an unhealthy state: HEALTH_ERR 1 MDSs report >>> damaged metadata. >>> How to repair this damaged metadata ? >>> To bring down the cephfs cluster I unmounted the fs from the >>> client >>> first and then did: ceph fs set cfs_irods_test down true >>> To bring up the cephfs cluster I did: ceph fs set >>> cfs_irods_test down false >>> Fortunately the cfs_irods_test fs is almost empty and is a fs >>> for >>> tests.The ceph cluster is not in production yet. >>> Following is the current status: >>> [root@mon-01 ~]# ceph health detail >>> HEALTH_ERR 1 MDSs report damaged metadata >>> *[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata >>> mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage >>> detected* >>> [root@mon-01 ~]# ceph -s >>> cluster: >>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e >>> health: HEALTH_ERR >>> 1 MDSs report damaged metadata >>> services: >>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 17h) >>> mgr: mon-02.mqaubn(active, since 17h), standbys: >>> mon-03.gvywio, >>> mon-01.xhxqdi >>> mds: 1/1 daemons up, 2 standby >>> osd: 368 osds: 368 up (since 17h), 368 in (since 3w) >>> data: >>> volumes: 1/1 healthy >>> pools: 10 pools, 4353 pgs >>> objects: 1.25M objects, 3.9 TiB >>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail >>> pgs: 4353 active+clean >>> [root@mon-01 ~]# ceph fs ls >>> name: cfs_irods_test, metadata pool: cfs_irods_md_test, data >>> pools: >>> [cfs_irods_def_test cfs_irods_data_test ] >>> [root@mon-01 ~]# ceph mds stat >>> cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2 >>> up:standby >>> [root@mon-01 ~]# ceph fs status >>> cfs_irods_test - 0 clients >>> ============== >>> RANK STATE MDS ACTIVITY DNS >>> INOS DIRS CAPS >>> 0 active cfs_irods_test.mon-03.vlmeuz Reqs: 0 /s >>> 12 15 >>> 14 0 >>> POOL TYPE USED AVAIL >>> cfs_irods_md_test metadata 11.4M 34.4T >>> cfs_irods_def_test data 0 34.4T >>> cfs_irods_data_test data 0 4542T >>> STANDBY MDS >>> cfs_irods_test.mon-01.hitdem >>> cfs_irods_test.mon-02.awuygq >>> MDS version: ceph version 18.2.2 >>> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) >>> [root@mon-01 ~]# >>> [root@mon-01 ~]# ceph tell mds.cfs_irods_test:0 damage ls >>> 2025-04-17T10:23:31.849+0200 7f4b87fff640 0 client.86181 >>> ms_handle_reset on v2:130.84.80.10:6800/3218663047 >>> [ http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] >>> [ http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] >>> 2025-04-17T10:23:31.866+0200 7f4b87fff640 0 client.86187 >>> ms_handle_reset on v2:130.84.80.10:6800/3218663047 >>> [ http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] >>> [ http://130.84.80.10:6800/3218663047 | >>> <http://130.84.80.10:6800/3218663047> ] >>> [ >>> { >>> *"damage_type": "dentry",* >>> "id": 241447932, >>> "ino": 1, >>> "frag": "*", >>> "dname": "testdir2", >>> "snap_id": "head", >>> "path": "/testdir2" >>> }, >>> { >>> *"damage_type": "dentry"*, >>> "id": 2273238993, >>> "ino": 1, >>> "frag": "*", >>> "dname": "testdir1", >>> "snap_id": "head", >>> "path": "/testdir1" >>> } >>> ] >>> [root@mon-01 ~]# >>> Any help will be appreciated, >>> Thanks, >>> Christophe >>> _______________________________________________ >>> ceph-users mailing list --ceph-users@ceph.io >>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io | >>> toceph-users-le...@ceph.io ] >>> _______________________________________________ >>> ceph-users mailing list --ceph-users@ceph.io >>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io | >>> toceph-users-le...@ceph.io ] >>> _______________________________________________ >>> ceph-users mailing list --ceph-users@ceph.io >>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io | >>> toceph-users-le...@ceph.io ] >>> _______________________________________________ >>> ceph-users mailing list --ceph-users@ceph.io >>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io | >>> toceph-users-le...@ceph.io ] >> _______________________________________________ >> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ] >> To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io | >> ceph-users-le...@ceph.io ] _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io