Hi Christophe, 

----- Le 24 Avr 25, à 12:50, Christophe DIARRA <christophe.dia...@idris.fr> a 
écrit : 

> Hello Frédéric,

> Here are the requested outputs (I kept some outputs during the repair of the 
> fs)
> :

> 1) 'rados df' when 'rados ls' command was failing (but after I have already
> removed the fs cfs_irods_test and the objects from the metadata pool
> cfs_irods_md_test ):
Actually, I was hoping to see the output of 'rados df' from the time when 
'rados ls' failed to complete. This output should be from before any objects 
were manually removed from the pools (did you manually remove any objects?) and 
before the filesystem was removed with the command 'ceph fs rm cfs_irods_test 
--yes-i-really-mean-it', as this command also removes objects in the pools. I 
want to ensure that the pools appearing empty in the 'rados df' output is not a 
result of stuck PGs/OSDs. 

Regards, 
Frédéric. 

> [mon-01]# ceph fs rm cfs_irods_test --yes-i-really-mean-it
> [mon-01]# rados -p cfs_irods_md_test ls
> 601.00000000
> ...
> 100.00000000.inode
> 1.00000000

> [mon-01]# for i in `rados -p cfs_irods_md_test ls`; do rados -p
> cfs_irods_md_test rm $i; done ==> this was unnecessary because later I simply
> deleted all the cephfs pools and recreated them

> [mon-01 ~]# rados df
> POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED 
> RD_OPS
> RD WR_OPS WR USED COMPR UNDER COMPR
> .mgr 1.4 GiB 123 0 369 0 0 0 477873 721 MiB 885240 19 GiB 0 B 0 B
> cfs_irods_data_test 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
> cfs_irods_def_test 0 B 0 0 0 0 0 0 1 0 B 80200 157 GiB 0 B 0 B
> cfs_irods_md_test 97 KiB 0 0 0 0 0 0 224 212 KiB 2463 4.5 MiB 0 B 0 B
> metadata_4hddrbd_rep3_ssd 1.9 MiB 5 0 15 0 0 0 1339579 1.7 GiB 226136 553 MiB > 0
> B 0 B
> metadata_4ssdrbd_rep3_ssd 239 KiB 5 0 15 0 0 0 1212124 1.5 GiB 203438 497 MiB > 0
> B 0 B
> pool_rbd_rep3_hdd 3.8 TiB 838156 0 2514468 0 0 0 17599175 545 GiB 34068128 2.5
> TiB 0 B 0 B
> pool_rbd_rep3_ssd 2.3 TiB 197298 0 591894 0 0 0 15948416 460 GiB 32329097 1.8
> TiB 0 B 0 B
> rbd_ec_k6m2_hdd 589 GiB 113057 0 904456 0 0 0 5021553 232 GiB 14520983 916 
> GiB 0
> B 0 B
> rbd_ec_k6m2_ssd 529 GiB 101552 0 812416 0 0 0 4787661 206 GiB 14524780 908 
> GiB 0
> B 0 B

> total_objects 1250196
> total_used 417 TiB
> total_avail 6.4 PiB
> total_space 6.8 PiB

> 2) 'rados df' when 'rados ls' command is working fine for all the pools :

> [mon-01 ~]# rados df
> POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED 
> RD_OPS
> RD WR_OPS WR USED COMPR UNDER COMPR
> .mgr 1.4 GiB 123 0 369 0 0 0 480493 723 MiB 890072 19 GiB 0 B 0 B
> cfs_irods_data_test 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
> cfs_irods_def_test 0 B 0 0 0 0 0 0 1 0 B 80200 157 GiB 0 B 0 B
> cfs_irods_md_test 97 KiB 0 0 0 0 0 0 224 212 KiB 2463 4.5 MiB 0 B 0 B
> metadata_4hddrbd_rep3_ssd 1.9 MiB 5 0 15 0 0 0 1339536 1.7 GiB 226136 553 MiB > 0
> B 0 B
> metadata_4ssdrbd_rep3_ssd 239 KiB 5 0 15 0 0 0 1212120 1.5 GiB 203438 497 MiB > 0
> B 0 B
> pool_rbd_rep3_hdd 3.8 TiB 838156 0 2514468 0 0 0 17594633 540 GiB 34068128 2.5
> TiB 0 B 0 B
> pool_rbd_rep3_ssd 2.3 TiB 197298 0 591894 0 0 0 15946345 458 GiB 32329097 1.8
> TiB 0 B 0 B
> rbd_ec_k6m2_hdd 589 GiB 113057 0 904456 0 0 0 5021360 232 GiB 14520983 916 
> GiB 0
> B 0 B
> rbd_ec_k6m2_ssd 529 GiB 101552 0 812416 0 0 0 4785748 204 GiB 14524780 908 
> GiB 0
> B 0 B

> total_objects 1250196
> total_used 417 TiB
> total_avail 6.4 PiB
> total_space 6.8 PiB
> [mon-01 ~]#

> From here I deleted the cephfs pools, recreated them and recreated the fs
> without any problem.

> Best regards,

> Christophe
> On 24/04/2025 08:52, Frédéric Nass wrote:

>> Hi Christophe,

>> Do you have a 'rados df' or 'ceph df' output (of all pools, not just the one
>> from the test filesystem) from the time when the 'rados ls' command was
>> failing?

>> I'm trying to determine if the pools were incorrectly reported as empty due 
>> to
>> stuck PGs or OSDs. This is important as we based our decision to delete the
>> filesystem on this (possibily wrong) information.

>> Regards,
>> Frédéric.

>> ----- Le 23 Avr 25, à 18:19, Christophe DIARRA [
>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] a écrit :

>>> Hello Frédéric,

>>> I have a new working fs now after deleting the fs + pools and recreating 
>>> them. I
>>> will mount the fs on the test client, create some files and do some tests:

>>> 1. shutdown and restart the cluster to see what will happen to the metadata

>>> 2. redo the test 1. by removing power from the rack for hours after the 
>>> cluster
>>> is shut down

>>> I will let you know, when the tests will be finished.

>>> Following is the current status:

>>> [mon-01 ~]# ceph fs status
>>> cfs_irods_test - 0 clients
>>> ==============
>>> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
>>> 0 active cfs_irods_test.mon-01.hitdem Reqs: 0 /s 10 13 12 0
>>> POOL TYPE USED AVAIL
>>> cfs_irods_md_test metadata 96.0k 34.4T
>>> cfs_irods_def_test data 0 2018T
>>> cfs_irods_data_test data 0 4542T
>>> STANDBY MDS
>>> cfs_irods_test.mon-03.vlmeuz
>>> cfs_irods_test.mon-02.awuygq
>>> MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) 
>>> reef
>>> (stable)

>>> Many thanks to you Frédéric and also to David, Anthony and Michel for the 
>>> advice
>>> and remarks.

>>> Best regards,

>>> Christophe

>>> On 23/04/2025 14:30, Christophe DIARRA wrote:

>>>> Hello Frédéric, Michel,

>>>> Rebooting the OSD one by one solved the problem of 'rados ls'. Now it is 
>>>> working
>>>> fine for all the pools.

>>>> The next step is to recreate the cephfs fs I deleted yesterday because of 
>>>> the
>>>> damaged metadata problem.

>>>> I will let you know.

>>>> Thanks,

>>>> Christophe

>>>> On 23/04/2025 12:47, Christophe DIARRA wrote:

>>>>> Hello Frédéric,

>>>>> Thank you for the answer.

>>>>> osd_mclock_max_capacity_iops_hdd is not defined. Only
>>>>> osd_mclock_max_capacity_iops_ssd is defined. I suppose that these are 
>>>>> defaults
>>>>> values.
>>>>> I didn't know anything about them until now.

>>>>> [mon-01 ~]# ceph config dump | grep osd_mclock_max_capacity_iops_hdd
>>>>> [cmon-01 ~]#

>>>>> [mon-01 ~]# ceph config dump | grep osd_mclock_max_capacity_iops
>>>>> osd.352 basic osd_mclock_max_capacity_iops_ssd 47136.994042
>>>>> osd.353 basic osd_mclock_max_capacity_iops_ssd 45567.566829
>>>>> osd.354 basic osd_mclock_max_capacity_iops_ssd 44979.777767
>>>>> osd.355 basic osd_mclock_max_capacity_iops_ssd 44494.118337
>>>>> osd.356 basic osd_mclock_max_capacity_iops_ssd 48002.559112
>>>>> osd.357 basic osd_mclock_max_capacity_iops_ssd 54686.144097
>>>>> osd.358 basic osd_mclock_max_capacity_iops_ssd 42349.183758
>>>>> osd.359 basic osd_mclock_max_capacity_iops_ssd 58134.190143
>>>>> osd.360 basic osd_mclock_max_capacity_iops_ssd 46867.824097
>>>>> osd.361 basic osd_mclock_max_capacity_iops_ssd 54869.366372
>>>>> osd.362 basic osd_mclock_max_capacity_iops_ssd 55875.432057
>>>>> osd.363 basic osd_mclock_max_capacity_iops_ssd 58346.849381
>>>>> osd.364 basic osd_mclock_max_capacity_iops_ssd 52520.181799
>>>>> osd.365 basic osd_mclock_max_capacity_iops_ssd 46632.056458
>>>>> osd.366 basic osd_mclock_max_capacity_iops_ssd 45746.055260
>>>>> osd.367 basic osd_mclock_max_capacity_iops_ssd 47884.575954

>>>>> I will restart the OSD nodes one by one and let you know if 'rados ls' 
>>>>> works
>>>>> again.

>>>>> Thanks,

>>>>> Christophe

>>>>> On 23/04/2025 12:23, Frédéric Nass wrote:

>>>>>> Hi Christophe,

>>>>>> Response inline

>>>>>> ----- Le 23 Avr 25, à 11:42, Christophe DIARRA [
>>>>>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] a 
>>>>>> écrit :

>>>>>>> Hello Frédéric,
>>>>>>> I removed the fs but haven't recreated it yet because I have a doubt 
>>>>>>> about the
>>>>>>> health of the cluster even though it seems healthy:
>>>>>>> [mon-01 ~]# ceph -s
>>>>>>> cluster:
>>>>>>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>>>>>>> health: HEALTH_OK
>>>>>>> services:
>>>>>>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 6d)
>>>>>>> mgr: mon-02.mqaubn(active, since 6d), standbys: mon-03.gvywio, 
>>>>>>> mon-01.xhxqdi
>>>>>>> osd: 368 osds: 368 up (since 16h), 368 in (since 3w)
>>>>>>> data:
>>>>>>> pools: 10 pools, 4353 pgs
>>>>>>> objects: 1.25M objects, 3.9 TiB
>>>>>>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail
>>>>>>> pgs: 4353 active+clean
>>>>>>> I observed that listing the objects in any hdd pool will hang at the 
>>>>>>> beginning
>>>>>>> for and empty hdd pool or after displaying the list of objects.
>>>>>>> I need to do a Ctrl-C to interrupt the hung 'rados ls' command. I don't 
>>>>>>> have
>>>>>>> this problem with the pools on sdd.
>>>>>>> [mon-01 ~]# rados lspools
>>>>>>> .mgr
>>>>>>> pool_rbd_rep3_hdd <------ hdd pool
>>>>>>> pool_rbd_rep3_ssd
>>>>>>> rbd_ec_k6m2_hdd <------ hdd pool
>>>>>>> rbd_ec_k6m2_ssd
>>>>>>> metadata_4hddrbd_rep3_ssd
>>>>>>> metadata_4ssdrbd_rep3_ssd
>>>>>>> cfs_irods_md_test
>>>>>>> cfs_irods_def_test
>>>>>>> cfs_irods_data_test <------ hdd pool
>>>>>>> [mon-01 ~]# 1) Testing 'rados ls' on hdd pools:
>>>>>>> [mon-01 ~]# rados -p cfs_irods_data_test ls
>>>>>>> (hangs forever) ==> Ctrl-C
>>>>>>> [mon-01 ~]# rados -p pool_rbd_rep3_hdd ls|head -2
>>>>>>> rbd_data.565ed6699dd8.0000000000097ff6
>>>>>>> rbd_data.565ed6699dd8.00000000001041fb
>>>>>>> (then hangs forever here) ==> Ctrl-C
>>>>>>> [mon-01 ~]# rados -p pool_rbd_rep3_hdd ls
>>>>>>> rbd_data.565ed6699dd8.0000000000097ff6
>>>>>>> rbd_data.565ed6699dd8.00000000001041fb
>>>>>>> rbd_data.565ed6699dd8.000000000004f1a3
>>>>>>> ...
>>>>>>> (list truncated by me)
>>>>>>> ...
>>>>>>> rbd_data.565ed6699dd8.000000000016809e
>>>>>>> rbd_data.565ed6699dd8.000000000007bc05
>>>>>>> (then hangs forever here) ==> Ctrl-C
>>>>>>> 2) With the pools on ssd everything works well (the 'rados ls' commands 
>>>>>>> doesn't
>>>>>>> hang):
>>>>>>> [mon-01 ~]# for i in $(rados lspools|egrep 'ssd|md|def'); do echo -n 
>>>>>>> "Pool $i
>>>>>>> :"; rados -p $i ls |wc -l; done
>>>>>>> Pool pool_rbd_rep3_ssd :197298
>>>>>>> Pool rbd_ec_k6m2_ssd :101552
>>>>>>> Pool metadata_4hddrbd_rep3_ssd :5
>>>>>>> Pool metadata_4ssdrbd_rep3_ssd :5
>>>>>>> Pool cfs_irods_md_test :0
>>>>>>> Pool cfs_irods_def_test :0
>>>>>>> Below is the configuration of the cluster:
>>>>>>> - 3 MONs (HPE DL360) + 8 OSD servers ( HPE Apol lo 4510 gen10)
>>>>>>> - each OSD server has 44x20TB HDD + 10x7.6TB SSD

>>>>>> This is dense. :-/

>>>>>>> - On each OSD server, 8 SSD are partioned and used for the wal/db of 
>>>>>>> the HDD OSD
>>>>>>> - On each OSD server 2 SSD are used for the ceph fs metadata and 
>>>>>>> default data
>>>>>>> pools.
>>>>>>> Do you see any configuration problem here which could lead to our 
>>>>>>> metadata
>>>>>>> problem ?
>>>>>>> Do you know what could cause the hang of the 'rados ls' command on the 
>>>>>>> HDD pools
>>>>>>> ? I would like to understand this problem before recreating an new 
>>>>>>> cephfs fs.

>>>>>> Inaccessible PGs, misbehaving OSDs, mClock scheduler in use with
>>>>>> osd_mclock_max_capacity_iops_hdd (auto)set way too low (check 'ceph 
>>>>>> config dump
>>>>>> | grep osd_mclock_max_capacity_iops_hdd').

>>>>>> Since this is consecutive to an electrical maintenance (power outage?), 
>>>>>> if
>>>>>> osd_mclock_max_capacity_iops_hdd is not the issue, I would restart all 
>>>>>> HDD OSDs
>>>>>> one by one or node by node to have all PGs repeered. Then try the 'rados 
>>>>>> ls'
>>>>>> command again.

>>>>>> Regards,
>>>>>> Frédéric.

>>>>>>> The cluster is still is testing state so we can do any tests you could
>>>>>>> recommend.
>>>>>>> Thanks,
>>>>>>> Christophe
>>>>>>> On 22/04/2025 16:46, Christophe DIARRA wrote:

>>>>>>>> Hello Frédéric,
>>>>>>>> 15 of the 16 parallel scanning workers terminated almost immediately . 
>>>>>>>> But one
>>>>>>>> worker is still running for 1+ hour:
>>>>>>>> [mon-01 log]# ps -ef|grep scan
>>>>>>>> root 1977927 1925004 0 15:18 pts/0 00:00:00 
>>>>>>>> cephfs-data-scanscan_extents
>>>>>>>> --filesystem cfs_irods_test --worker_n 11 --worker_m 16
>>>>>>>> [mon-01 log]# date;lsof -p 1977927|grep osd
>>>>>>>> Tue Apr 22 04:37:05 PM CEST 2025
>>>>>>>> cephfs-da 1977927 root 15u IPv4 7105122 0t0 TCP 
>>>>>>>> mon-01:34736->osd-06:6912
>>>>>>>> (ESTABLISHED)
>>>>>>>> cephfs-da 1977927 root 18u IPv4 7110774 0t0 TCP 
>>>>>>>> mon-01:45122->osd-03:ethoscan
>>>>>>>> (ESTABLISHED)
>>>>>>>> cephfs-da 1977927 root 19u IPv4 7105123 0t0 TCP 
>>>>>>>> mon-01:58556->osd-07:spg
>>>>>>>> (ESTABLISHED)
>>>>>>>> cephfs-da 1977927 root 20u IPv4 7049672 0t0 TCP 
>>>>>>>> mon-01:55064->osd-01:7112
>>>>>>>> (ESTABLISHED)
>>>>>>>> cephfs-da 1977927 root 21u IPv4 7082598 0t0 TCP 
>>>>>>>> mon-01:42120->osd-03-data:6896
>>>>>>>> (SYN_SENT)
>>>>>>>> [mon-01 log]#
>>>>>>>> The filesystem is empty. So I will follow your advice and remove it. 
>>>>>>>> After that
>>>>>>>> I will recreate it.
>>>>>>>> I will redo some proper shutdown and restart of the cluster to check 
>>>>>>>> if the
>>>>>>>> problem reappears with the newly recreated fs.
>>>>>>>> I will let you know.
>>>>>>>> Thank you for your help,
>>>>>>>> Christophe
>>>>>>>> On 22/04/2025 15:56, Frédéric Nass wrote:

>>>>>>>>> That, is weird for 2 reasons.
>>>>>>>>> The first reason is that the cephfs-data-scan should not run for a 
>>>>>>>>> couple of
>>>>>>>>> hours on empty data pools. I just tried to run it on an empty pool 
>>>>>>>>> and it
>>>>>>>>> doesn't run for more than maybe 10 seconds.
>>>>>>>>> The second reason is that the data pool cfs_irods_def_test should not 
>>>>>>>>> be empty,
>>>>>>>>> even with if the filesystem tree is. It should at least have a few 
>>>>>>>>> rados
>>>>>>>>> objects named after {100,200,400,60x}.00000000 and the root inode 
>>>>>>>>> 1.00000000 /
>>>>>>>>> 1.00000000.inode unless you removed the filesystem by running the 
>>>>>>>>> 'ceph fs rm
>>>>>>>>> <filesystem_name> --yes-i-really-mean-it' command which does remove 
>>>>>>>>> rados
>>>>>>>>> objects in the associated pools.
>>>>>>>>> If it's clear for you that this filesystem should be empty, I'd 
>>>>>>>>> advise you to
>>>>>>>>> remove it (using the 'ceph fs rm' command), delete any rados objects 
>>>>>>>>> in the
>>>>>>>>> metadata and data pools, and then recreate the filesystem.
>>>>>>>>> Regards,
>>>>>>>>> Frédéric.
>>>>>>>>> ----- Le 22 Avr 25, à 15:13, Christophe DIARRA [
>>>>>>>>> [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] ] 
>>>>>>>>> a écrit :
>>>>>>>>> Hello Frédéric,
>>>>>>>>> I have:
>>>>>>>>> [mon-01 ~]# rados df | grep -E
>>>>>>>>> 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'
>>>>>>>>> POOL_NAME USED OBJECTS CLONES COPIES
>>>>>>>>> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD
>>>>>>>>> WR_OPS WR USED COMPR UNDER COMPR
>>>>>>>>> cfs_irods_data_test 0 B 0 0 0
>>>>>>>>> 0 0 0 0 0 B
>>>>>>>>> 0 0 B 0 B 0 B
>>>>>>>>> cfs_irods_def_test 0 B 0 0 0
>>>>>>>>> 0 0 0 1 0 B
>>>>>>>>> 80200 157 GiB 0 B 0 B
>>>>>>>>> [mon-01 ~]#
>>>>>>>>> I will interrupt the current scanning process and rerun it with
>>>>>>>>> more workers.
>>>>>>>>> Thanks,
>>>>>>>>> Christophe
>>>>>>>>> On 22/04/2025 15:05, Frédéric Nass wrote:
>>>>>>>>> Hum... Obviously this 'empty' filesystem has way more rados
>>>>>>>>> objects in the 2 data pools than expected. You should see that
>>>>>>>>> many objects with:
>>>>>>>>> rados df | grep -E
>>>>>>>>> 'OBJECTS|cfs_irods_def_test|cfs_irods_data_test'
>>>>>>>>> If waiting is not an option, you can break the scan_extents
>>>>>>>>> command, re-run it with multiple workers, and then proceed
>>>>>>>>> with the next scan (scan_links). Just make sure you run the
>>>>>>>>> next scan with multiple workers as well.
>>>>>>>>> Regards,
>>>>>>>>> Frédéric.
>>>>>>>>> ----- Le 22 Avr 25, à 14:54, Christophe DIARRA
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] ]
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | 
>>>>>>>>> <mailto:christophe.dia...@idris.fr> ] ] a
>>>>>>>>> écrit :
>>>>>>>>> Hello Frédéric,
>>>>>>>>> I ran the commands (see below) but the command
>>>>>>>>> 'cephfs-data-scan scan_extents --filesystem
>>>>>>>>> cfs_irods_test' is not finished yet. It has been running
>>>>>>>>> for 2+ hours. I didn't run it in parallel because it
>>>>>>>>> contains empty directories only. According to [1]:
>>>>>>>>> "scan_extents and scan_inodes commands may take a very
>>>>>>>>> long time if the data pool contains many files or very
>>>>>>>>> large files. Now I think I should have run the command in
>>>>>>>>> parallel. I don't know if it is safe to interrupt it and
>>>>>>>>> then rerun it with 16 workers.
>>>>>>>>> On 22/04/2025 12:13, Frédéric Nass wrote:
>>>>>>>>> Hi Christophe,
>>>>>>>>> You could but it won't be of any help since the
>>>>>>>>> journal is empty. What you can do to fix the fs
>>>>>>>>> metadata is to run the below commands from the
>>>>>>>>> disaster-recovery-experts documentation [1] in this
>>>>>>>>> particular order:
>>>>>>>>> #Prevent access to the fs and set it down.
>>>>>>>>> ceph fs set cfs_irods_test refuse_client_session true
>>>>>>>>> ceph fs set cfs_irods_test joinable false
>>>>>>>>> ceph fs set cfs_irods_test down true
>>>>>>>>> [mon-01 ~]# ceph fs set cfs_irods_test
>>>>>>>>> refuse_client_session true
>>>>>>>>> client(s) blocked from establishing new session(s)
>>>>>>>>> [mon-01 ~]# ceph fs set cfs_irods_test joinable false
>>>>>>>>> cfs_irods_test marked not joinable; MDS cannot join as
>>>>>>>>> newly active.
>>>>>>>>> [mon-01 ~]# ceph fs set cfs_irods_test down true
>>>>>>>>> cfs_irods_test marked down.
>>>>>>>>> # Reset maps and journal
>>>>>>>>> cephfs-table-tool cfs_irods_test:0 reset session
>>>>>>>>> cephfs-table-tool cfs_irods_test:0 reset snap
>>>>>>>>> cephfs-table-tool cfs_irods_test:0 reset inode
>>>>>>>>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset session
>>>>>>>>> {
>>>>>>>>> "0": {
>>>>>>>>> "data": {},
>>>>>>>>> "result": 0
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset snap
>>>>>>>>> Error ((2) No such file or directory)
>>>>>>>>> 2025-04-22T12:29:09.550+0200 7f1d4c03e100 -1 main: Bad
>>>>>>>>> rank selection: cfs_irods_test:0'
>>>>>>>>> [mon-01 ~]# cephfs-table-tool cfs_irods_test:0 reset inode
>>>>>>>>> Error ((2) No such file or
>>>>>>>>> directory2025-04-22T12:29:43.880+0200 7f0878a3a100 -1
>>>>>>>>> main: Bad rank selection: cfs_irods_test:0'
>>>>>>>>> )
>>>>>>>>> cephfs-journal-tool --rank cfs_irods_test:0 journal
>>>>>>>>> reset --force
>>>>>>>>> cephfs-data-scan init --force-init --filesystem
>>>>>>>>> cfs_irods_test
>>>>>>>>> [mon-01 ~]# cephfs-journal-tool --rank cfs_irods_test:0
>>>>>>>>> journal reset --force
>>>>>>>>> Error ((2) No such file or directory)
>>>>>>>>> 2025-04-22T12:34:42.474+0200 7fe8b3a36100 -1 main:
>>>>>>>>> Couldn't determine MDS rank.
>>>>>>>>> [mon-01 ~]# cephfs-data-scan init --force-init
>>>>>>>>> --filesystem cfs_irods_test
>>>>>>>>> [mon-01 ~]#
>>>>>>>>> # Rescan data and fix metadata (leaving the below
>>>>>>>>> commands commented for information on how to // these
>>>>>>>>> scan tasks)
>>>>>>>>> #for i in {0..15} ; do cephfs-data-scan scan_frags
>>>>>>>>> --filesystem cfs_irods_test --force-corrupt --worker_n
>>>>>>>>> $i --worker_m 16 & done
>>>>>>>>> #for i in {0..15} ; do cephfs-data-scan scan_extents
>>>>>>>>> --filesystem cfs_irods_test --worker_n $i --worker_m
>>>>>>>>> 16 & done
>>>>>>>>> #for i in {0..15} ; do cephfs-data-scan scan_inodes
>>>>>>>>> --filesystem cfs_irods_test --force-corrupt --worker_n
>>>>>>>>> $i --worker_m 16 & done
>>>>>>>>> #for i in {0..15} ; do cephfs-data-scan scan_links
>>>>>>>>> --filesystem cfs_irods_test --worker_n $i --worker_m
>>>>>>>>> 16 & done
>>>>>>>>> cephfs-data-scan scan_frags --filesystem
>>>>>>>>> cfs_irods_test --force-corrupt
>>>>>>>>> cephfs-data-scan scan_extents --filesystem cfs_irods_test
>>>>>>>>> [mon-01 ~]# cephfs-data-scan scan_frags --filesystem
>>>>>>>>> cfs_irods_test --force-corrupt
>>>>>>>>> [mon-01 ~]# cephfs-data-scan scan_extents --filesystem
>>>>>>>>> cfs_irods_test *------> still running*
>>>>>>>>> I don't know how long it will take. Once it will be
>>>>>>>>> completed I will run the remaining commands.
>>>>>>>>> Thanks,
>>>>>>>>> Christophe
>>>>>>>>> cephfs-data-scan scan_inodes --filesystem
>>>>>>>>> cfs_irods_test --force-corrupt
>>>>>>>>> cephfs-data-scan scan_links --filesystem cfs_irods_test
>>>>>>>>> cephfs-data-scan cleanup --filesystem cfs_irods_test
>>>>>>>>> #ceph mds repaired 0 <---- should not be necessary
>>>>>>>>> # Set the fs back online and accessible
>>>>>>>>> ceph fs set cfs_irods_test down false
>>>>>>>>> ceph fs set cfs_irods_test joinable true
>>>>>>>>> ceph fs set cfs_irods_test refuse_client_session false
>>>>>>>>> An MDS should now start, if not then use 'ceph orch
>>>>>>>>> daemon restart mds.xxxxx' to start a MDS. After
>>>>>>>>> remounting the fs you should be able to access
>>>>>>>>> /testdir1 and /testdir2 in the fs root.
>>>>>>>>> # scrub the fs again to check that if everything is OK.
>>>>>>>>> ceph tell mds.cfs_irods_test:0 scrub start /
>>>>>>>>> recursive,repair,force
>>>>>>>>> Regards,
>>>>>>>>> Frédéric.
>>>>>>>>> [1]
>>>>>>>>> [ [ https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ 
>>>>>>>>> |
>>>>>>>>> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ ] |
>>>>>>>>> [ https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ |
>>>>>>>>> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ ] ]
>>>>>>>>> ----- Le 22 Avr 25, à 10:21, Christophe DIARRA
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] ]
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | 
>>>>>>>>> <mailto:christophe.dia...@idris.fr> ] ] a
>>>>>>>>> écrit :
>>>>>>>>> Hello Frédéric,
>>>>>>>>> Thank your for your help.
>>>>>>>>> Following is output you asked for:
>>>>>>>>> [mon-01 ~]# date
>>>>>>>>> Tue Apr 22 10:09:10 AM CEST 2025
>>>>>>>>> [root@fidrcmon-01 ~]# ceph tell
>>>>>>>>> mds.cfs_irods_test:0 scrub start /
>>>>>>>>> recursive,repair,force
>>>>>>>>> 2025-04-22T10:09:12.796+0200 7f43f6ffd640 0
>>>>>>>>> client.86553 ms_handle_reset on
>>>>>>>>> v2:130.84.80.10:6800/3218663047
>>>>>>>>> 2025-04-22T10:09:12.818+0200 7f43f6ffd640 0
>>>>>>>>> client.86559 ms_handle_reset on
>>>>>>>>> v2:130.84.80.10:6800/3218663047
>>>>>>>>> {
>>>>>>>>> "return_code": 0,
>>>>>>>>> "scrub_tag":
>>>>>>>>> "12e537bb-bb39-4f3b-ae09-e0a1ae6ce906",
>>>>>>>>> "mode": "asynchronous"
>>>>>>>>> }
>>>>>>>>> [root@fidrcmon-01 ~]# ceph tell
>>>>>>>>> mds.cfs_irods_test:0 scrub status
>>>>>>>>> 2025-04-22T10:09:31.760+0200 7f3f0f7fe640 0
>>>>>>>>> client.86571 ms_handle_reset on
>>>>>>>>> v2:130.84.80.10:6800/3218663047
>>>>>>>>> 2025-04-22T10:09:31.781+0200 7f3f0f7fe640 0
>>>>>>>>> client.86577 ms_handle_reset on
>>>>>>>>> v2:130.84.80.10:6800/3218663047
>>>>>>>>> {
>>>>>>>>> "status": "no active scrubs running",
>>>>>>>>> "scrubs": {}
>>>>>>>>> }
>>>>>>>>> [root@fidrcmon-01 ~]# cephfs-journal-tool --rank
>>>>>>>>> cfs_irods_test:0 event recover_dentries list
>>>>>>>>> 2025-04-16T18:24:56.802960+0200 0x7c334a
>>>>>>>>> SUBTREEMAP: ()
>>>>>>>>> [root@fidrcmon-01 ~]#
>>>>>>>>> Based on this output, can I run the other three
>>>>>>>>> commands provided in your message :
>>>>>>>>> ceph tell mds.0 flush journal
>>>>>>>>> ceph mds fail 0
>>>>>>>>> ceph tell mds.cfs_irods_test:0 scrub start / recursive
>>>>>>>>> Thanks,
>>>>>>>>> Christophe
>>>>>>>>> On 19/04/2025 12:55, Frédéric Nass wrote:
>>>>>>>>> Hi Christophe, Hi David,
>>>>>>>>> Could you share the ouptut of the below command after running the 
>>>>>>>>> scrubbing with
>>>>>>>>> recursive,repair,force?
>>>>>>>>> cephfs-journal-tool --rank cfs_irods_test:0 event recover_dentries 
>>>>>>>>> list
>>>>>>>>> Could be that the MDS recovered these 2 dentries in its journal 
>>>>>>>>> already but the
>>>>>>>>> status of the filesystem was not updated yet. I've seen this 
>>>>>>>>> happening before.
>>>>>>>>> If that the case, you could try a flush, fail and re-scrub:
>>>>>>>>> ceph tell mds.0 flush journal
>>>>>>>>> ceph mds fail 0
>>>>>>>>> ceph tell mds.cfs_irods_test:0 scrub start / recursive
>>>>>>>>> This might clear the HEALTH_ERR. If not, then it will be easy to fix 
>>>>>>>>> by
>>>>>>>>> rebuilding / fixing the metadata from the data pools since this fs is 
>>>>>>>>> empty.
>>>>>>>>> Let us know,
>>>>>>>>> Regards,
>>>>>>>>> Frédéric.
>>>>>>>>> ----- Le 18 Avr 25, à 9:51, [ [ mailto:daviddavid.cas...@aevoo.fr |
>>>>>>>>> mailto:daviddavid.cas...@aevoo.fr ] |
>>>>>>>>> [ mailto:daviddavid.cas...@aevoo.fr | daviddavid.cas...@aevoo.fr ] ] 
>>>>>>>>> a écrit :
>>>>>>>>> I also tend to think that the disk has nothing to do with the problem.
>>>>>>>>> My reading is that the inode associated with the dentry is missing.
>>>>>>>>> Can anyone correct me?
>>>>>>>>> Christophe informed me that the directories were emptied before the
>>>>>>>>> incident.
>>>>>>>>> I don't understand why scrubbing doesn't repair the meta data.
>>>>>>>>> Perhaps because the directory is empty ?
>>>>>>>>> Le jeu. 17 avr. 2025 à 19:06, Anthony D'Atri [ [ 
>>>>>>>>> mailto:anthony.da...@gmail.com
>>>>>>>>> | mailto:anthony.da...@gmail.com ] |
>>>>>>>>> [ mailto:anthony.da...@gmail.com | <anthony.da...@gmail.com> ] ] [ [
>>>>>>>>> mailto:anthony.da...@gmail.com | mailto:anthony.da...@gmail.com ] |
>>>>>>>>> [ mailto:anthony.da...@gmail.com | <mailto:anthony.da...@gmail.com> ] 
>>>>>>>>> ] a
>>>>>>>>> écrit :
>>>>>>>>> HPE rebadges drives from manufacturers. A quick search supports the 
>>>>>>>>> idea
>>>>>>>>> that this SKU is fulfilled at least partly by Kioxia, so not likely a 
>>>>>>>>> PLP
>>>>>>>>> issue.
>>>>>>>>> On Apr 17, 2025, at 11:39 AM, Christophe DIARRA <
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | christophe.dia...@idris.fr ] ] > 
>>>>>>>>> wrote:
>>>>>>>>> Hello David,
>>>>>>>>> The SSD model is VO007680JWZJL.
>>>>>>>>> I will delay the 'ceph tell mds.cfs_irods_test:0 damage rm 241447932'
>>>>>>>>> for the moment. If any other solution is found I will be obliged to 
>>>>>>>>> use
>>>>>>>>> this command.
>>>>>>>>> I found 'dentry' in the logs when the cephfs cluster started:
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: 
>>>>>>>>> mds.cfs_irods_test.mon-02.awuygq
>>>>>>>>> Updating MDS map to version 15613 from mon.2
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 handle_mds_map i am
>>>>>>>>> now mds.0.15612
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 handle_mds_map 
>>>>>>>>> state
>>>>>>>>> change up:starting --> up:active
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.15612 active_start
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.cache.den(0x1 testdir2)
>>>>>>>>> loaded already *corrupt dentry*: [dentry #0x1/testdir2 [2, [ [ 
>>>>>>>>> mailto:head |
>>>>>>>>> mailto:head ] ]rep@0.0
>>>>>>>>> | [ mailto:head]rep@0.0 | head]rep@0.0 ] ]
>>>>>>>>> NULL (dversion lock) pv=0 v=4442 ino=(n
>>>>>>>>> il) state=0 0x5617e18c8280]
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mds[2367]: mds.0.cache.den(0x1 testdir1)
>>>>>>>>> loaded already *corrupt dentry*: [dentry #0x1/testdir1 [2, [ [ 
>>>>>>>>> mailto:head |
>>>>>>>>> mailto:head ] ]rep@0.0
>>>>>>>>> | [ mailto:head]rep@0.0 | head]rep@0.0 ] ]
>>>>>>>>> NULL (dversion lock) pv=0 v=4442 ino=(n
>>>>>>>>> il) state=0 0x5617e18c8500]
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check failed: 1
>>>>>>>>> filesystem is offline (MDS_ALL_DOWN)
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check failed: 1
>>>>>>>>> filesystem is online with fewer MDS than max_mds 
>>>>>>>>> (MDS_UP_LESS_THAN_MAX)
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: from='client.?
>>>>>>>>> xx.xx.xx.8:0/3820885518' entity='client.admin' cmd='[{"prefix": "fs 
>>>>>>>>> set",
>>>>>>>>> "fs_name": "cfs_irods_test", "var": "down", "val":
>>>>>>>>> "false"}]': finished
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon
>>>>>>>>> mds.cfs_irods_test.mon-02.awuygq assigned to filesystem 
>>>>>>>>> cfs_irods_test as
>>>>>>>>> rank 0 (now has 1 ranks)
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check cleared:
>>>>>>>>> MDS_ALL_DOWN (was: 1 filesystem is offline)
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: Health check cleared:
>>>>>>>>> MDS_UP_LESS_THAN_MAX (was: 1 filesystem is online with fewer MDS than
>>>>>>>>> max_mds)
>>>>>>>>> Apr 16 17:29:53 mon-02 ceph-mon[2288]: daemon
>>>>>>>>> mds.cfs_irods_test.mon-02.awuygq is now active in filesystem 
>>>>>>>>> cfs_irods_test
>>>>>>>>> as rank 0
>>>>>>>>> Apr 16 17:29:54 mon-02 ceph-mgr[2444]: log_channel(cluster) log [DBG] 
>>>>>>>>> :
>>>>>>>>> pgmap v1721: 4353 pgs: 4346 active+clean, 7 
>>>>>>>>> active+clean+scrubbing+deep;
>>>>>>>>> 3.9 TiB data, 417 TiB used, 6.4 P
>>>>>>>>> iB / 6.8 PiB avail; 1.4 KiB/s rd, 1 op/s
>>>>>>>>> If you need more extract from the log file please let me know.
>>>>>>>>> Thanks for your help,
>>>>>>>>> Christophe
>>>>>>>>> On 17/04/2025 13:39, David C. wrote:
>>>>>>>>> If I'm not mistaken, this is a fairly rare situation.
>>>>>>>>> The fact that it's the result of a power outage makes me think of a 
>>>>>>>>> bad
>>>>>>>>> SSD (like "S... Pro").
>>>>>>>>> Does a grep of the dentry id in the MDS logs return anything?
>>>>>>>>> Maybe some interesting information around this grep
>>>>>>>>> In the heat of the moment, I have no other idea than to delete the
>>>>>>>>> dentry.
>>>>>>>>> ceph tell mds.cfs_irods_test:0 damage rm 241447932
>>>>>>>>> However, in production, this results in the content (of dir
>>>>>>>>> /testdir[12]) being abandoned.
>>>>>>>>> Le jeu. 17 avr. 2025 à 12:44, Christophe DIARRA <
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | christophe.dia...@idris.fr ] ] > 
>>>>>>>>> a écrit :
>>>>>>>>> Hello David,
>>>>>>>>> Thank you for the tip about the scrubbing. I have tried the
>>>>>>>>> commands found in the documentation but it seems to have no effect:
>>>>>>>>> [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub start /
>>>>>>>>> recursive,repair,force*
>>>>>>>>> 2025-04-17T12:07:20.958+0200 7fd4157fa640 0 client.86301
>>>>>>>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>>>>>>>> 2025-04-17T12:07:20.979+0200<
>>>>>>>>> [ [ http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 |
>>>>>>>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 ] |
>>>>>>>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 |
>>>>>>>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 ] ] > 
>>>>>>>>> [
>>>>>>>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 |
>>>>>>>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 ] |
>>>>>>>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200 |
>>>>>>>>> <http://130.84.80.10:6800/32186630472025-04-17T12:07:20.979+0200> ] ]
>>>>>>>>> 7fd4157fa640 0 client.86307 ms_handle_reset on v2:
>>>>>>>>> 130.84.80.10:6800/3218663047 [ [ http://130.84.80.10:6800/3218663047 |
>>>>>>>>> http://130.84.80.10:6800/3218663047 ] |
>>>>>>>>> [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047> ]
>>>>>>>>> ] [ [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> http://130.84.80.10:6800/3218663047
>>>>>>>>> ] |
>>>>>>>>> [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047> ]
>>>>>>>>> ]
>>>>>>>>> {
>>>>>>>>> "return_code": 0,
>>>>>>>>> "scrub_tag": "733b1c6d-a418-4c83-bc8e-b28b556e970c",
>>>>>>>>> "mode": "asynchronous"
>>>>>>>>> }
>>>>>>>>> [root@mon-01 ~]#*ceph tell mds.cfs_irods_test:0 scrub status*
>>>>>>>>> 2025-04-17T12:07:30.734+0200 7f26cdffb640 0 client.86319
>>>>>>>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>>>>>>>> 2025-04-17T12:07:30.753+0200<
>>>>>>>>> [ [ http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 |
>>>>>>>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 ] |
>>>>>>>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 |
>>>>>>>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 ] ] > 
>>>>>>>>> [
>>>>>>>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 |
>>>>>>>>> http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 ] |
>>>>>>>>> [ http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200 |
>>>>>>>>> <http://130.84.80.10:6800/32186630472025-04-17T12:07:30.753+0200> ] ]
>>>>>>>>> 7f26cdffb640 0 client.86325 ms_handle_reset on v2:
>>>>>>>>> 130.84.80.10:6800/3218663047 [ [ http://130.84.80.10:6800/3218663047 |
>>>>>>>>> http://130.84.80.10:6800/3218663047 ] |
>>>>>>>>> [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047> ]
>>>>>>>>> ] [ [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> http://130.84.80.10:6800/3218663047
>>>>>>>>> ] |
>>>>>>>>> [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047> ]
>>>>>>>>> ]
>>>>>>>>> {
>>>>>>>>> "status": "no active scrubs running",
>>>>>>>>> "scrubs": {}
>>>>>>>>> }
>>>>>>>>> [root@mon-01 ~]# ceph -s
>>>>>>>>> cluster:
>>>>>>>>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>>>>>>>>> *health: HEALTH_ERR 1 MDSs report damaged metadata*
>>>>>>>>> services:
>>>>>>>>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 19h)
>>>>>>>>> mgr: mon-02.mqaubn(active, since 19h), standbys: mon-03.gvywio,
>>>>>>>>> mon-01.xhxqdi
>>>>>>>>> mds: 1/1 daemons up, 2 standby
>>>>>>>>> osd: 368 osds: 368 up (since 18h), 368 in (since 3w)
>>>>>>>>> data:
>>>>>>>>> volumes: 1/1 healthy
>>>>>>>>> pools: 10 pools, 4353 pgs
>>>>>>>>> objects: 1.25M objects, 3.9 TiB
>>>>>>>>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail
>>>>>>>>> pgs: 4353 active+clean
>>>>>>>>> Did I miss something ?
>>>>>>>>> The server didn't crash. I don't understand what you are meaning
>>>>>>>>> by "there may be a design flaw in the infrastructure (insecure
>>>>>>>>> cache, for example)".
>>>>>>>>> How to know if we have a design problem ? What should we check ?
>>>>>>>>> Best regards,
>>>>>>>>> Christophe
>>>>>>>>> On 17/04/2025 11:07, David C. wrote:
>>>>>>>>> Hello Christophe,
>>>>>>>>> Check the file system scrubbing procedure =>
>>>>>>>>> [ [ https://docs.ceph.com/en/latest/cephfs/scrub/ |
>>>>>>>>> https://docs.ceph.com/en/latest/cephfs/scrub/ ] |
>>>>>>>>> [ https://docs.ceph.com/en/latest/cephfs/scrub/ |
>>>>>>>>> https://docs.ceph.com/en/latest/cephfs/scrub/ ] ] But this doesn't
>>>>>>>>> guarantee data recovery.
>>>>>>>>> Was the cluster crashed?
>>>>>>>>> Ceph should be able to handle it; there may be a design flaw in
>>>>>>>>> the infrastructure (insecure cache, for example).
>>>>>>>>> David
>>>>>>>>> Le jeu. 17 avr. 2025 à 10:44, Christophe DIARRA
>>>>>>>>> [ [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | <christophe.dia...@idris.fr> ] ] [
>>>>>>>>> [ mailto:christophe.dia...@idris.fr | 
>>>>>>>>> mailto:christophe.dia...@idris.fr ] | [
>>>>>>>>> mailto:christophe.dia...@idris.fr | 
>>>>>>>>> <mailto:christophe.dia...@idris.fr> ] ] a
>>>>>>>>> écrit :
>>>>>>>>> Hello,
>>>>>>>>> After an electrical maintenance I restarted our ceph cluster
>>>>>>>>> but it
>>>>>>>>> remains in an unhealthy state: HEALTH_ERR 1 MDSs report
>>>>>>>>> damaged metadata.
>>>>>>>>> How to repair this damaged metadata ?
>>>>>>>>> To bring down the cephfs cluster I unmounted the fs from the
>>>>>>>>> client
>>>>>>>>> first and then did: ceph fs set cfs_irods_test down true
>>>>>>>>> To bring up the cephfs cluster I did: ceph fs set
>>>>>>>>> cfs_irods_test down false
>>>>>>>>> Fortunately the cfs_irods_test fs is almost empty and is a fs
>>>>>>>>> for
>>>>>>>>> tests.The ceph cluster is not in production yet.
>>>>>>>>> Following is the current status:
>>>>>>>>> [root@mon-01 ~]# ceph health detail
>>>>>>>>> HEALTH_ERR 1 MDSs report damaged metadata
>>>>>>>>> *[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata
>>>>>>>>> mds.cfs_irods_test.mon-03.vlmeuz(mds.0): Metadata damage
>>>>>>>>> detected*
>>>>>>>>> [root@mon-01 ~]# ceph -s
>>>>>>>>> cluster:
>>>>>>>>> id: b87276e0-1d92-11ef-a9d6-507c6f66ae2e
>>>>>>>>> health: HEALTH_ERR
>>>>>>>>> 1 MDSs report damaged metadata
>>>>>>>>> services:
>>>>>>>>> mon: 3 daemons, quorum mon-01,mon-03,mon-02 (age 17h)
>>>>>>>>> mgr: mon-02.mqaubn(active, since 17h), standbys:
>>>>>>>>> mon-03.gvywio,
>>>>>>>>> mon-01.xhxqdi
>>>>>>>>> mds: 1/1 daemons up, 2 standby
>>>>>>>>> osd: 368 osds: 368 up (since 17h), 368 in (since 3w)
>>>>>>>>> data:
>>>>>>>>> volumes: 1/1 healthy
>>>>>>>>> pools: 10 pools, 4353 pgs
>>>>>>>>> objects: 1.25M objects, 3.9 TiB
>>>>>>>>> usage: 417 TiB used, 6.4 PiB / 6.8 PiB avail
>>>>>>>>> pgs: 4353 active+clean
>>>>>>>>> [root@mon-01 ~]# ceph fs ls
>>>>>>>>> name: cfs_irods_test, metadata pool: cfs_irods_md_test, data
>>>>>>>>> pools:
>>>>>>>>> [cfs_irods_def_test cfs_irods_data_test ]
>>>>>>>>> [root@mon-01 ~]# ceph mds stat
>>>>>>>>> cfs_irods_test:1 {0=cfs_irods_test.mon-03.vlmeuz=up:active} 2
>>>>>>>>> up:standby
>>>>>>>>> [root@mon-01 ~]# ceph fs status
>>>>>>>>> cfs_irods_test - 0 clients
>>>>>>>>> ==============
>>>>>>>>> RANK STATE MDS ACTIVITY DNS
>>>>>>>>> INOS DIRS CAPS
>>>>>>>>> 0 active cfs_irods_test.mon-03.vlmeuz Reqs: 0 /s
>>>>>>>>> 12 15
>>>>>>>>> 14 0
>>>>>>>>> POOL TYPE USED AVAIL
>>>>>>>>> cfs_irods_md_test metadata 11.4M 34.4T
>>>>>>>>> cfs_irods_def_test data 0 34.4T
>>>>>>>>> cfs_irods_data_test data 0 4542T
>>>>>>>>> STANDBY MDS
>>>>>>>>> cfs_irods_test.mon-01.hitdem
>>>>>>>>> cfs_irods_test.mon-02.awuygq
>>>>>>>>> MDS version: ceph version 18.2.2
>>>>>>>>> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>>>>>>>>> [root@mon-01 ~]#
>>>>>>>>> [root@mon-01 ~]# ceph tell mds.cfs_irods_test:0 damage ls
>>>>>>>>> 2025-04-17T10:23:31.849+0200 7f4b87fff640 0 client.86181
>>>>>>>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>>>>>>>> [ [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> http://130.84.80.10:6800/3218663047 ]
>>>>>>>>> | [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047>
>>>>>>>>> ] ]
>>>>>>>>> [ [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> http://130.84.80.10:6800/3218663047 ]
>>>>>>>>> | [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047>
>>>>>>>>> ] ]
>>>>>>>>> 2025-04-17T10:23:31.866+0200 7f4b87fff640 0 client.86187
>>>>>>>>> ms_handle_reset on v2:130.84.80.10:6800/3218663047
>>>>>>>>> [ [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> http://130.84.80.10:6800/3218663047 ]
>>>>>>>>> | [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047>
>>>>>>>>> ] ]
>>>>>>>>> [ [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> http://130.84.80.10:6800/3218663047 ]
>>>>>>>>> | [ http://130.84.80.10:6800/3218663047 | 
>>>>>>>>> <http://130.84.80.10:6800/3218663047>
>>>>>>>>> ] ]
>>>>>>>>> [
>>>>>>>>> {
>>>>>>>>> *"damage_type": "dentry",*
>>>>>>>>> "id": 241447932,
>>>>>>>>> "ino": 1,
>>>>>>>>> "frag": "*",
>>>>>>>>> "dname": "testdir2",
>>>>>>>>> "snap_id": "head",
>>>>>>>>> "path": "/testdir2"
>>>>>>>>> },
>>>>>>>>> {
>>>>>>>>> *"damage_type": "dentry"*,
>>>>>>>>> "id": 2273238993,
>>>>>>>>> "ino": 1,
>>>>>>>>> "frag": "*",
>>>>>>>>> "dname": "testdir1",
>>>>>>>>> "snap_id": "head",
>>>>>>>>> "path": "/testdir1"
>>>>>>>>> }
>>>>>>>>> ]
>>>>>>>>> [root@mon-01 ~]#
>>>>>>>>> Any help will be appreciated,
>>>>>>>>> Thanks,
>>>>>>>>> Christophe
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list --ceph-users@ceph.io
>>>>>>>>> To unsubscribe send an email [ [ mailto:toceph-users-le...@ceph.io |
>>>>>>>>> mailto:toceph-users-le...@ceph.io ] |
>>>>>>>>> [ mailto:toceph-users-le...@ceph.io | toceph-users-le...@ceph.io ] ]
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list --ceph-users@ceph.io
>>>>>>>>> To unsubscribe send an email [ [ mailto:toceph-users-le...@ceph.io |
>>>>>>>>> mailto:toceph-users-le...@ceph.io ] |
>>>>>>>>> [ mailto:toceph-users-le...@ceph.io | toceph-users-le...@ceph.io ] ]
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list --ceph-users@ceph.io
>>>>>>>>> To unsubscribe send an email [ [ mailto:toceph-users-le...@ceph.io |
>>>>>>>>> mailto:toceph-users-le...@ceph.io ] |
>>>>>>>>> [ mailto:toceph-users-le...@ceph.io | toceph-users-le...@ceph.io ] ]
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list --ceph-users@ceph.io
>>>>>>>>> To unsubscribe send an email [ [ mailto:toceph-users-le...@ceph.io |
>>>>>>>>> mailto:toceph-users-le...@ceph.io ] |
>>>>>>>>> [ mailto:toceph-users-le...@ceph.io | toceph-users-le...@ceph.io ] ]

>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list -- [ [ mailto:ceph-users@ceph.io |
>>>>>>>> mailto:ceph-users@ceph.io ] | [ mailto:ceph-users@ceph.io | 
>>>>>>>> ceph-users@ceph.io
>>>>>>>> ] ]
>>>>>>>> To unsubscribe send an email to [ [ mailto:ceph-users-le...@ceph.io |
>>>>>>>> mailto:ceph-users-le...@ceph.io ] |
>>>>>>>> [ mailto:ceph-users-le...@ceph.io | ceph-users-le...@ceph.io ] ]

>>>>>> _______________________________________________
>>>>>> ceph-users mailing list --ceph-users@ceph.io
>>>>>> To unsubscribe send an email [ mailto:toceph-users-le...@ceph.io |
>>>>>> toceph-users-le...@ceph.io ]

>>>>> _______________________________________________
>>>>> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | 
>>>>> ceph-users@ceph.io ]
>>>>> To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io |
>>>>> ceph-users-le...@ceph.io ]

>>>> _______________________________________________
>>>> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | 
>>>> ceph-users@ceph.io ]
>>>> To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io |
>>>> ceph-users-le...@ceph.io ]
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to