[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-29 Thread Dietmar Rieder

Hi Enrico,

thanks so much for your comment. You are right, that's what I figured 
out a bit later, see below.


BTW, I was able to repair the filesystem and all is working fine again, 
it seems that we did not lose any data (will post a summary, for the record)


Thanks again,
   DIetmar

On 6/28/24 13:08, Enrico Bocchi wrote:

Hi Dietmar,

I understand the option to be set is 'wsync', not 'nowsync'. See 
https://docs.ceph.com/en/latest/man/8/mount.ceph/
nowsync enables async dirops, which is what triggers the assertion in 
https://tracker.ceph.com/issues/61009


The reason why you don't see it in /proc/mounts is because it is the 
default in recent kernels (see 
https://github.com/gregkh/linux/commit/f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902)
If you set 'wsync' among your mount options, this will show up in 
/proc/mounts


Cheers,
Enrico


On 6/27/24 06:37, Dietmar Rieder wrote:


[...]



Oh, I think I misunderstood the suggested workaround. I guess we need 
to disable "nowsync", which is set by default, right?


so: -o wsync

should be the workaround, right?





OpenPGP_signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-06-29 Thread Dietmar Rieder

Hi all,

finally we were able to repair the filesystem and it seems that we did 
not lose any data. Thanks for all suggestions and comments.


Here is a short summary of our journey:


1. At some point all our 6 MDS were going to error state one after another

2. We tried to restart them but they kept crashing

3. We learned that unfortunately we hit a known bug: 



4. We set the filesystem down "ceph fs set cephfs down true" and 
unmounted it from all clients.


5. We started with the disaster recovery procedure:



I. Backup the journal
cephfs-journal-tool --rank=cephfs:all journal export /mnt/backup/backup.bin

II. DENTRY recovery from journal

(We have 3 active MDS)
cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
cephfs-journal-tool --rank=cephfs:1 event recover_dentries summary
cephfs-journal-tool --rank=cephfs:2 event recover_dentries summary

cephfs-journal-tool --rank=cephfs:all  journal inspect
Overall journal integrity: OK
Overall journal integrity: DAMAGED
Corrupt regions:
  0xd9a84f243c-
Overall journal integrity: OK

The journal from rank 1 still shows damage

III. Journal truncation

cephfs-journal-tool --rank=cephfs:0 journal reset
cephfs-journal-tool --rank=cephfs:1 journal reset
cephfs-journal-tool --rank=cephfs:2 journal reset

IV. MDS table wipes

cephfs-table-tool all reset session

cephfs-journal-tool --rank=cephfs:1  journal inspect
Overall journal integrity: OK

V. MDS MAP reset

ceph fs reset cephfs --yes-i-really-mean-it

After these steps to reset and trim the journal we tried to restart the 
MDS, however they were still dying shortly after starting.


So as Xiubo suggested we went on with the disaster recovery procedure...

VI. Recovery from missing metadata objects

cephfs-table-tool 0 reset session
cephfs-table-tool 0 reset snap
cephfs-table-tool 0 reset inode

cephfs-journal-tool --rank=cephfs:0 journal reset

cephfs-data-scan init

The "cephfs-data-scan init" gave us warnings about already existing inodes:
Inode 0x0x1 already exists, skipping create.  Use --force-init to 
overwrite the existing object.
Inode 0x0x100 already exists, skipping create.  Use --force-init to 
overwrite the existing object.


We decided not to use --force-init and went on with

cephfs-data-scan scan_extents sdd-rep-data-pool hdd-ec-data-pool

The docs say it can take a "very long time", unfortunately the tool is 
not producing any ETA. After ~24hrs we interupted the process and 
restarted it with 32 workers.


The parallel scan_extents took about 2h and 15 min and did not generate 
any output on stdout or stderr


So we went on with parallel (32 workers) scan_inodes which also 
completed without any output after ~ 50 min.


We then ran "cephfs-data-scan scan_links", however the tool was stopping 
afer ~ 45 min. with an error message: Error ((2) No such file or directory)


We tried to go on anyway with "cephfs-data-scan cleanup". The cleanup 
was running for about 9h and 20min and did not produce any output.


So we tried to startup the MDS again, however the still kept crashing:

2024-06-23T08:21:50.197+ 7feeb5177700  1 mds.0.8075 rejoin_start
2024-06-23T08:21:50.201+ 7feeb5177700  1 mds.0.8075 rejoin_joint_start
2024-06-23T08:21:50.204+ 7feeaf16b700  1 
mds.0.cache.den(0x100 groups) loaded already corrupt dentry: 
[dentry #0x1/data/groups [bf,head] rep@0.0 NULL (dversion lock) pv=0 
v=7910497 ino=(nil) state=0 0x55aa27a19180]

[]
2024-06-23T08:21:50.228+ 7feeaf16b700 -1 log_channel(cluster) log 
[ERR] : bad backtrace on directory inode 0x10003e42340

[...]
2024-06-23T08:21:50.345+ 7feeaf16b700 -1 log_channel(cluster) log 
[ERR] : bad backtrace on directory inode 0x10003e45d8b

[]
-6> 2024-06-23T08:21:50.351+ 7feeaf16b700 10 log_client  will 
send 2024-06-23T08:21:50.229835+ mds.default.cephmon-03.xcujhz 
(mds.0) 1 : cluster [ERR] bad backtrace on direc

tory inode 0x10003e42340
-5> 2024-06-23T08:21:50.351+ 7feeaf16b700 10 log_client  will 
send 2024-06-23T08:21:50.347085+ mds.default.cephmon-03.xcujhz 
(mds.0) 2 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b
-4> 2024-06-23T08:21:50.351+ 7feeaf16b700 10 monclient: 
_send_mon_message to mon.cephmon-03 at v2:10.1.3.23:3300/0
-3> 2024-06-23T08:21:50.351+ 7feeaf16b700  5 
mds.beacon.default.cephmon-03.xcujhz Sending beacon down:damaged seq 90
-2> 2024-06-23T08:21:50.351+ 7feeaf16b700 10 monclient: 
_send_mon_message to mon.cephmon-03 at v2:10.1.3.23:3300/0
-1> 2024-06-23T08:21:50.371+ 7feeb817d700  5 
mds.beacon.default.cephmon-03.xcujhz received beacon reply down:damaged 
seq 90 rtt 0.022
 0> 2024-06-23T08:21:50.371+ 7feeaf16b700  1 
mds.default.cephmon-03.xcujhz respawn!


So we decided to retry the "scan_links" and "cleanup" steps:

cephfs-data-scan scan_links
Took about 50min., no error this 

[ceph-users] cannot delete service by ceph orchestrator

2024-06-29 Thread Alex from North
Hi everybody!
never seen this before and google stay silent. Just found the same question in 
2021 but no answer there (((
so, by ceph orch ls I see:

root@ceph1:~/ceph-rollout# ceph orch ls
NAME PORTSRUNNING  REFRESHED  AGE  PLACEMENT
   
alertmanager ?:9093,9094  1/1  9m ago 6d   count:1  
   
ceph-exporter   10/10  9m ago 6d   *
   
crash   10/10  9m ago 6d   *
   
grafana  ?:3000   1/1  9m ago 6d   count:1  
   
mgr   3/3  9m ago 6d   
ceph1;ceph6;ceph10;count:3  
mon   3/3  9m ago 6d   
ceph1;ceph6;ceph10;count:3  
node-exporter?:9100 10/10  9m ago 6d   *
 
osd.osd_using_paths   108  9m ago -  
   
prometheus   ?:9095   3/3  9m ago 6d   
ceph1;ceph6;ceph10;count:3  


And then ceph orch rm 

root@ceph1:~/ceph-rollout# ceph orch rm osd.osd_using_paths
Invalid service 'osd.osd_using_paths'. Use 'ceph orch ls' to list available 
services.


cannot find it for some reason. 

In the same time 

root@ceph1:~/ceph-rollout# ceph orch ls --service_name osd.osd_using_paths 
--export
service_type: osd
service_id: osd_using_paths
service_name: osd.osd_using_paths
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore


Why does it happen? 
And how to delete this not needed service?

Thanks in advance.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot delete service by ceph orchestrator

2024-06-29 Thread Robert Sander

On 29.06.24 12:11, Alex from North wrote:


osd.osd_using_paths   108  9m ago -
prometheus   ?:9095   3/3  9m ago 6d   
ceph1;ceph6;ceph10;count:3


And then ceph orch rm 

root@ceph1:~/ceph-rollout# ceph orch rm osd.osd_using_paths
Invalid service 'osd.osd_using_paths'. Use 'ceph orch ls' to list available 
services.


cannot find it for some reason.



unmanaged: true


As long as there are OSDs that were created by a drivegroup 
specification you cannot really delete the service. It is set to 
unmanaged and therefor will not create any new OSDs.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot delete service by ceph orchestrator

2024-06-29 Thread Alex from North
Ah! I guess I got it! 
So, once all OSDs (made by specification I'd like to delete) are gone - service 
will disappear as well, right?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io