Hum... Obviously this 'empty' filesystem has way more rados objects in
the 2 data pools than expected. You should see that many objects with:
rados df | grep -E 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'
If waiting is not an option, you can break the scan_extents command,
re-run it with multiple workers, and then proceed with the next scan
(scan_links). Just make sure you run the next scan with multiple
workers as well.
Regards,
Frédéric.
----- Le 22 Avr 25, à 14:54, Christophe DIARRA
<christophe.dia...@idris.fr> a écrit :
Hello Frédéric,
I ran the commands (see below) but the command 'cephfs-data-scan
scan_extents --filesystem cfs_irods_test' is not finished yet. It
has been running for 2+ hours. I didn't run it in parallel because
it contains empty directories only. According to [1]:
"scan_extents and scan_inodes commands may take a very long time
if the data pool contains many files or very large files. Now I
think I should have run the command in parallel. I don't know if
it is safe to interrupt it and then rerun it with 16 workers.
On 22/04/2025 12:13, Frédéric Nass wrote:
Hi Christophe,
You could but it won't be of any help since the journal is
empty. What you can do to fix the fs metadata is to run the
below commands from the disaster-recovery-experts
documentation [1] in this particular order:
#Prevent access to the fs and set it down.
ceph fs set cfs_irods_test refuse_client_session true
ceph fs set cfs_irods_test joinable false
ceph fs set cfs_irods_test down true
[mon-01 ~]# ceph fs set cfs_irods_test refuse_client_session true
client(s) blocked from establishing new session(s)
[mon-01 ~]# ceph fs set cfs_irods_test joinable false
cfs_irods_test marked not joinable; MDS cannot join as newly active.
[mon-01 ~]# ceph fs set cfs_irods_test down true
cfs_irods_test marked down.
# Reset maps and journal
cephfs-table-tool cfs_irods_test:0 reset session
cephfs-table-tool cfs_irods_test:0 reset snap
cephfs-table-tool cfs_irods_test:0 reset inode
[mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset session
{
"0": {
"data": {},
"result": 0
}
}
[mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset snap
Error ((2) No such file or directory)
2025-04-22T12:29:09.550+0200 7f1d4c03e100 -1 main: Bad rank
selection: cfs_irods_test:0'
[mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset inode
Error ((2) No such file or directory2025-04-22T12:29:43.880+0200
7f0878a3a100 -1 main: Bad rank selection: cfs_irods_test:0'
)
cephfs-journal-tool --rank cfs_irods_test:0 journal reset --force
cephfs-data-scan init --force-init --filesystem cfs_irods_test
[mon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0 journal
reset --force
Error ((2) No such file or directory)
2025-04-22T12:34:42.474+0200 7fe8b3a36100 -1 main: Couldn't
determine MDS rank.
[mon-01 ~]# cephfs-data-scan init --force-init --filesystem
cfs_irods_test
[mon-01 ~]#
# Rescan data and fix metadata (leaving the below commands
commented for information on how to // these scan tasks)
#for i in {0..15} ; do cephfs-data-scan scan_frags
--filesystem cfs_irods_test --force-corrupt --worker_n $i
--worker_m 16 & done
#for i in {0..15} ; do cephfs-data-scan scan_extents
--filesystem cfs_irods_test --worker_n $i --worker_m 16 & done
#for i in {0..15} ; do cephfs-data-scan scan_inodes
--filesystem cfs_irods_test --force-corrupt --worker_n $i
--worker_m 16 & done
#for i in {0..15} ; do cephfs-data-scan scan_links
--filesystem cfs_irods_test --worker_n $i --worker_m 16 & done
cephfs-data-scan scan_frags --filesystem cfs_irods_test
--force-corrupt
cephfs-data-scan scan_extents --filesystem cfs_irods_test
[mon-01 ~]# cephfs-data-scan scan_frags --filesystem
cfs_irods_test --force-corrupt
[mon-01 ~]# cephfs-data-scan scan_extents --filesystem
cfs_irods_test *------> still running*
I don't know how long it will take. Once it will be completed I
will run the remaining commands.
Thanks,
Christophe
cephfs-data-scan scan_inodes --filesystem cfs_irods_test
--force-corrupt
cephfs-data-scan scan_links --filesystem cfs_irods_test
cephfs-data-scan cleanup --filesystem cfs_irods_test
#ceph mds repaired 0 <---- should not be necessary
# Set the fs back online and accessible
ceph fs set cfs_irods_test down false
ceph fs set cfs_irods_test joinable true
ceph fs set cfs_irods_test refuse_client_session false
An MDS should now start, if not then use 'ceph orch daemon
restart mds.xxxxx' to start a MDS. After remounting the fs you
should be able to access /testdir1 and /testdir2 in the fs root.
# scrub the fs again to check that if everything is OK.
ceph tell mds.cfs_irods_test:0 scrub start /
recursive,repair,force
Regards,
Frédéric.
[1]
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/
----- Le 22 Avr 25, à 10:21, Christophe DIARRA
<christophe.dia...@idris.fr>
<mailto:christophe.dia...@idris.fr> a écrit :
Hello Frédéric,
Thank your for your help.
Following is output you asked for:
[root@fidrcmon-01 ~]# date
Tue Apr 22 10:09:10 AM CEST 2025
[root@fidrcmon-01 ~]# ceph tell mds.cfs_irods_test:0 scrub
start / recursive,repair,force
2025-04-22T10:09:12.796+0200 7f43f6ffd640 0 client.86553
ms_handle_reset on v2:130.84.80.10:6800/3218663047
2025-04-22T10:09:12.818+0200 7f43f6ffd640 0 client.86559
ms_handle_reset on v2:130.84.80.10:6800/3218663047
{
"return_code": 0,
"scrub_tag": "12e537bb-bb39-4f3b-ae09-e0a1ae6ce906",
"mode": "asynchronous"
}
[root@fidrcmon-01 ~]# ceph tell mds.cfs_irods_test:0 scrub
status
2025-04-22T10:09:31.760+0200 7f3f0f7fe640 0 client.86571
ms_handle_reset on v2:130.84.80.10:6800/3218663047
2025-04-22T10:09:31.781+0200 7f3f0f7fe640 0 client.86577
ms_handle_reset on v2:130.84.80.10:6800/3218663047
{
"status": "no active scrubs running",
"scrubs": {}
}
[root@fidrcmon-01 ~]# cephfs-journal-tool --rank
cfs_irods_test:0 event recover_dentries list
2025-04-16T18:24:56.802960+0200 0x7c334a SUBTREEMAP: ()
[root@fidrcmon-01 ~]#
Based on this output, can I run the other three commands
provided in your message :
ceph tell mds.0 flush journal
ceph mds fail 0
ceph tell mds.cfs_irods_test:0 scrub start / recursive
Thanks,
Christophe
On 19/04/2025 12:55, Frédéric Nass wrote:
Hi Christophe, Hi David,
Could you share the ouptut of the below command after running
the scrubbing with recursive,repair,force?
cephfs-journal-tool --rank cfs_irods_test:0 event
recover_dentries list
Could be that the MDS recovered these 2 dentries in its journal
already but the status of the filesystem was not updated yet. I've seen this
happening before.
If that the case, you could try a flush, fail and re-scrub:
ceph tell mds.0 flush journal
ceph mds fail 0
ceph tell mds.cfs_irods_test:0 scrub start / recursive
This might clear the HEALTH_ERR. If not, then it will be easy
to fix by rebuilding / fixing the metadata from the data pools since this fs is
empty.
Let us know,
Regards,
Frédéric.
----- Le 18 Avr 25, à 9:51, daviddavid.cas...@aevoo.fr a écrit :
I also tend to think that the disk has nothing to do with
the problem.
My reading is that the inode associated with the dentry is
missing.
Can anyone correct me?
Christophe informed me that the directories were emptied
before the
incident.
I don't understand why scrubbing doesn't repair the meta
data.
Perhaps because the directory is empty ?
Le jeu. 17 avr. 2025 à 19:06, Anthony
D'Atri<anthony.da...@gmail.com> <mailto:anthony.da...@gmail.com> a
écrit :
HPE rebadges drives from manufacturers. A quick search
supports the idea
that this SKU is fulfilled at least partly by Kioxia,
so not likely a PLP
issue.
On Apr 17, 2025, at 11:39 AM, Christophe DIARRA <
christophe.dia...@idris.fr> wrote:
Hello David,
The SSD model is VO007680JWZJL.
I will delay the 'ceph tell mds.cfs_irods_test:0
damage rm 241447932'
for the moment. If any other solution is found I will
be obliged to use
this command.
I found 'dentry' in the logs when the cephfs
cluster started:
Apr 16 17:29:53 mon-02 ceph-mds[2367]:
mds.cfs_irods_test.mon-02.awuygq
Updating MDS map to version 15613 from mon.2
Apr 16 17:29:53 mon-02 ceph-mds[2367]:
mds.0.15612 handle_mds_map i am
now mds.0.15612
Apr 16 17:29:53 mon-02 ceph-mds[2367]:
mds.0.15612 handle_mds_map state
change up:starting --> up:active
Apr 16 17:29:53 mon-02 ceph-mds[2367]:
mds.0.15612 active_start
Apr 16 17:29:53 mon-02 ceph-mds[2367]:
mds.0.cache.den(0x1 testdir2)
loaded already *corrupt dentry*: [dentry #0x1/testdir2
[2,head]rep@0.0
NULL (dversion lock) pv=0 v=4442 ino=(n
il) state=0 0x5617e18c8280]
Apr 16 17:29:53 mon-02 ceph-mds[2367]:
mds.0.cache.den(0x1 testdir1)
loaded already *corrupt dentry*: [dentry #0x1/testdir1
[2,head]rep@0.0
NULL (dversion lock) pv=0 v=4442 ino=(n
il) state=0 0x5617e18c8500]
Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health
check failed: 1
filesystem is offline (MDS_ALL_DOWN)
Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health
check failed: 1
filesystem is online with fewer MDS than max_mds
(MDS_UP_LESS_THAN_MAX)
Apr 16 17:29:53 mon-02 ceph-mon[2288]:
from='client.?
xx.xx.xx.8:0/3820885518' entity='client.admin' cmd='[{"prefix":
"fs set",
"fs_name": "cfs_irods_test", "var": "down", "val":
"false"}]': finished
Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon
mds.cfs_irods_test.mon-02.awuygq assigned to filesystem
cfs_irods_test as
rank 0 (now has 1 ranks)
Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health
check cleared:
MDS_ALL_DOWN (was: 1 filesystem is offline)
Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health
check cleared:
MDS_UP_LESS_THAN_MAX (was: 1 filesystem is online with
fewer MDS than
max_mds)
Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon
mds.cfs_irods_test.mon-02.awuygq is now active in
filesystem cfs_irods_test
as rank 0
Apr 16 17:29:54 mon-02 ceph-mgr[2444]:
log_channel(cluster) log [DBG] :
pgmap v1721: 4353 pgs: 4346 active+clean, 7
active+clean+scrubbing+deep;
3.9 TiB data, 417 TiB used, 6.4 P
iB / 6.8 PiB avail; 1.4 KiB/s rd, 1 op/s
If you need more extract from the log file please
let me know.
Thanks for your help,
Christophe
On 17/04/2025 13:39, David C. wrote:
If I'm not mistaken, this is a fairly rare
situation.
The fact that it's the result of a power outage
makes me think of a bad
SSD (like "S... Pro").
Does a grep of the dentry id in the MDS logs
return anything?
Maybe some interesting information around this
grep
In the heat of the moment, I have no other idea
than to delete the
dentry.
ceph tell mds.cfs_irods_test:0 damage rm
241447932
However, in production, this results in the
content (of dir
/testdir[12]) being abandoned.
Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA
<
christophe.dia...@idris.fr> a écrit :
Hello David,
Thank you for the tip about the scrubbing.
I have tried the
commands found in the documentation but it
seems to have no effect:
[root@mon-01 ~]#*ceph tell
mds.cfs_irods_test:0 scrub start /
recursive,repair,force*
2025-04-17T12:07:20.958+0200 7fd4157fa640
0 client.86301
ms_handle_reset on v2:130.84.80.10:6800/3218663047
2025-04-17T12:07:20.979+0200<
http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200>
<http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200>
7fd4157fa640 0 client.86307 ms_handle_reset on v2:
130.84.80.10:6800/3218663047<http://130.84.80.10:6800/3218663047>
<http://130.84.80.10:6800/3218663047>
{
"return_code": 0,
"scrub_tag":
"733b1c6d-a418-4c83-bc8e-b28b556e970c",
"mode": "asynchronous"
}
[root@mon-01 ~]#*ceph tell
mds.cfs_irods_test:0 scrub status*
2025-04-17T12:07:30.734+0200 7f26cdffb640
0 client.86319
ms_handle_reset on v2:130.84.80.10:6800/3218663047
2025-04-17T12:07:30.753+0200<
http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200>
<http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200>
7f26cdffb640 0 client.86325 ms_handle_reset on v2:
130.84.80.10:6800/3218663047<http://130.84.80.10:6800/3218663047>
<http://130.84.80.10:6800/3218663047>
{
"status": "no active scrubs running",
"scrubs": {}
}
[root@mon-01 ~]# ceph -s
cluster:
id:
b87276e0-1d92-11ef-a9d6-507c6f66ae2e
*health: HEALTH_ERR 1 MDSs
report damaged metadata*
services:
mon: 3 daemons, quorum
mon-01,mon-03,mon-02 (age 19h)
mgr: mon-02.mqaubn(active, since 19h),
standbys: mon-03.gvywio,
mon-01.xhxqdi
mds: 1/1 daemons up, 2 standby
osd: 368 osds: 368 up (since 18h), 368
in (since 3w)
data:
volumes: 1/1 healthy
pools: 10 pools, 4353 pgs
objects: 1.25M objects, 3.9 TiB
usage: 417 TiB used, 6.4 PiB / 6.8
PiB avail
pgs: 4353 active+clean
Did I miss something ?
The server didn't crash. I don't understand
what you are meaning
by "there may be a design flaw in the
infrastructure (insecure
cache, for example)".
How to know if we have a design problem ?
What should we check ?
Best regards,
Christophe
On 17/04/2025 11:07, David C. wrote:
Hello Christophe,
Check the file system scrubbing procedure
=>
https://docs.ceph.com/en/latest/cephfs/scrub/ But this doesn't
guarantee data recovery.
Was the cluster crashed?
Ceph should be able to handle it; there
may be a design flaw in
the infrastructure (insecure cache, for
example).
David
Le jeu. 17 avr. 2025 à 10:44,
Christophe DIARRA
<christophe.dia...@idris.fr>
<mailto:christophe.dia...@idris.fr> a écrit :
Hello,
After an electrical maintenance I
restarted our ceph cluster
but it
remains in an unhealthy state:
HEALTH_ERR 1 MDSs report
damaged metadata.
How to repair this damaged metadata
?
To bring down the cephfs cluster I
unmounted the fs from the
client
first and then did: ceph fs set
cfs_irods_test down true
To bring up the cephfs cluster I
did: ceph fs set
cfs_irods_test down false
Fortunately the cfs_irods_test fs
is almost empty and is a fs
for
tests.The ceph cluster is not in
production yet.
Following is the current status:
[root@mon-01 ~]# ceph health detail
HEALTH_ERR 1 MDSs report damaged
metadata
*[ERR] MDS_DAMAGE: 1 MDSs report
damaged metadata
mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage
detected*
[root@mon-01 ~]# ceph -s
cluster:
id:
b87276e0-1d92-11ef-a9d6-507c6f66ae2e
health: HEALTH_ERR
1 MDSs report damaged
metadata
services:
mon: 3 daemons, quorum
mon-01,mon-03,mon-02 (age 17h)
mgr: mon-02.mqaubn(active,
since 17h), standbys:
mon-03.gvywio,
mon-01.xhxqdi
mds: 1/1 daemons up, 2 standby
osd: 368 osds: 368 up (since
17h), 368 in (since 3w)
data:
volumes: 1/1 healthy
pools: 10 pools, 4353 pgs
objects: 1.25M objects, 3.9 TiB
usage: 417 TiB used, 6.4 PiB
/ 6.8 PiB avail
pgs: 4353 active+clean
[root@mon-01 ~]# ceph fs ls
name: cfs_irods_test, metadata
pool: cfs_irods_md_test, data
pools:
[cfs_irods_def_test
cfs_irods_data_test ]
[root@mon-01 ~]# ceph mds stat
cfs_irods_test:1
{0=cfs_irods_test.mon-03.vlmeuz=up:active} 2
up:standby
[root@mon-01 ~]# ceph fs status
cfs_irods_test - 0 clients
==============
RANK STATE MDS
ACTIVITY DNS
INOS DIRS CAPS
0 active
cfs_irods_test.mon-03.vlmeuz Reqs: 0 /s
12 15
14 0
POOL TYPE
USED AVAIL
cfs_irods_md_test metadata
11.4M 34.4T
cfs_irods_def_test data
0 34.4T
cfs_irods_data_test data 0
4542T
STANDBY MDS
cfs_irods_test.mon-01.hitdem
cfs_irods_test.mon-02.awuygq
MDS version: ceph version 18.2.2
(531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
[root@mon-01 ~]#
[root@mon-01 ~]# ceph tell
mds.cfs_irods_test:0 damage ls
2025-04-17T10:23:31.849+0200
7f4b87fff640 0 client.86181
ms_handle_reset on
v2:130.84.80.10:6800/3218663047
<http://130.84.80.10:6800/3218663047>
<http://130.84.80.10:6800/3218663047>
2025-04-17T10:23:31.866+0200
7f4b87fff640 0 client.86187
ms_handle_reset on
v2:130.84.80.10:6800/3218663047
<http://130.84.80.10:6800/3218663047>
<http://130.84.80.10:6800/3218663047>
[
{
*"damage_type": "dentry",*
"id": 241447932,
"ino": 1,
"frag": "*",
"dname": "testdir2",
"snap_id": "head",
"path": "/testdir2"
},
{
*"damage_type": "dentry"*,
"id": 2273238993,
"ino": 1,
"frag": "*",
"dname": "testdir1",
"snap_id": "head",
"path": "/testdir1"
}
]
[root@mon-01 ~]#
Any help will be appreciated,
Thanks,
Christophe
_______________________________________________
ceph-users mailing list
--ceph-users@ceph.io
To unsubscribe send an email
toceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email
toceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io