[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-10-29 Thread Eugen Block
Hi, I haven't done this in production yet either, but in a test cluster I threw away that config-key and it just gets regenerated. So I suppose one could try that without any bis risk. Just a note, this should also work (get instead of dump): ceph config-key get mgr/cephadm/host.ceph-osd31.

[ceph-users] MDS and stretched clusters

2024-10-29 Thread Sake Ceph
Hi all We deployed successfully a stretched cluster and all is working fine. But is it possible to assign the active MDS services in one DC and the standby-replay in the other? We're running 18.2.4, deployed via cephadm. Using 4 MDS servers with 2 active MDS on pinnend ranks and 2 in standby-re

[ceph-users] Re: Ceph Crash Module "RADOS permission denied"

2024-10-29 Thread Tim Holloway
This is a common error on my system (Pacific). It appears that there is internal confusion as to where the crash support stuff lives - whether it's new-style (administered and under /var/lib/ceph/fsid) or legacy style (/var/lib/ceph). One way to fake it out was to manually created a minimal c

[ceph-users] Re: MDS and stretched clusters

2024-10-29 Thread Gregory Farnum
No, unfortunately this needs to be done at a higher level and is not included in Ceph right now. Rook may be able to do this, but I don't think cephadm does. Adam, is there some way to finagle this with pod placement rules (ie, tagging nodes as mds and mds-standby, and then assigning special mds co

[ceph-users] Re: MDS and stretched clusters

2024-10-29 Thread Travis Nielsen
Yes, with Rook this is possible by adding zone anti-affinity for the MDS pods. Travis On Tue, Oct 29, 2024 at 3:35 PM Gregory Farnum wrote: > No, unfortunately this needs to be done at a higher level and is not > included in Ceph right now. Rook may be able to do this, but I don't think > cepha

[ceph-users] Re: no recovery running

2024-10-29 Thread David Turner
I was running into that as well. Setting `osd_mclock_override_recovery_settings` [1] to true allowed me to manage osd_max_backfills again and get recovery to start happening again. It's on my todo list to understand mclock profiles, but resizing PGs was a nightmare with it. Changing to override the

[ceph-users] Re: Destroyed OSD clinging to wrong disk

2024-10-29 Thread Tim Holloway
Take care when reading the output of "ceph osd metadata". When you are running the OSD as an administered service, it's running in a container, and a container is a miniature VM. So, for example, it may report your OS as "CentOS Stream 8" even if your actual machine is running Ubuntu. The big

[ceph-users] Ceph Crash Module "RADOS permission denied"

2024-10-29 Thread mailing-lists
Hey Cephers, i was investigating some other issue, when I stumbled across this. I am not sure, if this is "as intended" or faulty. This is a cephadm cluster on reef 18.2.4, containerized with docker. The ceph-crash module states that it cant find its key and that it cant access RADOS. Pre-

[ceph-users] Re: MDS and stretched clusters

2024-10-29 Thread Frédéric Nass
Hi, I'm not aware of any service settings that would allow that. You'll have to monitor each MDS state and restart any non-local active MDSs to reverse roles. Regards, Frédéric. - Le 29 Oct 24, à 14:06, Sake Ceph c...@paulusma.eu a écrit : > Hi all > We deployed successfully a stretched c

[ceph-users] Re: MDS and stretched clusters

2024-10-29 Thread Sake Ceph
I hope someone of the development team can share some light on this. Will search the tracker if some else made a request about this. > Op 29-10-2024 16:02 CET schreef Frédéric Nass > : > > > Hi, > > I'm not aware of any service settings that would allow that. > > You'll have to monitor eac

[ceph-users] Re: Destroyed OSD clinging to wrong disk

2024-10-29 Thread Dave Hall
Tim, Thank you for your guidance. Your points are completely understood. It was more that I couldn't figure out why the Dashboard was telling me that the destroyed OSD was still using /dev/sdi when the physical disk with that serial number was at /dev/sdc, and when another OSD was also reporting

[ceph-users] Re: MDS and stretched clusters

2024-10-29 Thread Frédéric Nass
But you don't get to choose which one is active and which one is standby, as these are states that permute over time, not configurations, or do you?   I mean there's no way to tell Rook 'i want this one to be active preferably' and have Rook operator monitor MDSs and restart the non-local one if

[ceph-users] why performance difference between 'rados bench seq' and 'rados bench rand' quite significant

2024-10-29 Thread Louisa
Hi all, We used 'rados bench' to test 4k object read and write operations. Our cluster is pacific, one node, 11 bluestore osd ,db and wal share the block device. Block device is HDD. 1. testing 4k write with command 'rados bench 120 write -t 16 -b 4K -p rep3datapool --run-name 4kreadwrite --n

[ceph-users] Re: why performance difference between 'rados bench seq' and 'rados bench rand' quite significant

2024-10-29 Thread Anthony D'Atri
The good Mr. Nelson and others may have more to contribute, but a few thoughts: * Running for 60 or 120 seconds isn’t quantitative: rados bench typically exhibits a clear ramp-up; watch the per-second stats. * Suggest running for 10 minutes, three times in a row and averaging the results * How m

[ceph-users] Re: why performance difference between 'rados bench seq' and 'rados bench rand' quite significant

2024-10-29 Thread Louisa
rep3datapool pg num is 512, Average number of PG replicas per OSD is 139 scrubs, balancer and pg autoscaler was disabled RAM is 128G, swap is 0 From: Anthony D'Atri Date: 2024-10-30 12:03 To: Louisa CC: ceph-users Subject: Re: [ceph-users] why performance difference between 'rados bench seq' and '