[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-07-08 Thread Stefan Kooman
On 02-04-2024 15:09, Zakhar Kirpichenko wrote: Hi, I'm adding a few OSDs to an existing cluster, the cluster is running with `osd noout,noin`: cluster: id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86 health: HEALTH_WARN noout,noin flag(s) set Specifically `noin` is docum

[ceph-users] Re: Sanity check

2024-07-08 Thread Eugen Block
Hi, your crush rule distributes each chunk on a different host, so your failure domain is host. The crush-failure-domain=osd from the EC profile most likely is from the initial creation, maybe it was supposed to be OSD during initial tests or whatever, but the crush rule is key here. We

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-08 Thread Ivan Clayson
-20240708.gz. We ran the backups on kernel mounts of the filesystem without the nowsync option this time to avoid the out-of-sync write problems.. I've tried resetting the journal again after recovering the dentries but unfortunately the filesystem is still in a failed state despite se

[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-07-08 Thread Eugen Block
Hi, sorry for the delayed response, I was on vacation. I would set the "debug_rbd_mirror" config to 15 (or higher) and then watch the logs: # ceph config set client.rbd-mirror. debug_rbd_mirror 15 Maybe that reveals anything. Regards, Eugen Zitat von scott.cai...@tecnica-ltd.co.uk: Thank

[ceph-users] Re: pg's stuck activating on osd create

2024-07-08 Thread Eugen Block
Hi, it depends a bit on the actual OSD layout on the node and your procedure, but there's a chance you might have hit the overdose. But I would expect it to be logged in the OSD logs, two years ago in a Nautilus cluster the message looked like this: maybe_wait_for_max_pg withhold creatio

[ceph-users] Slow osd ops on large arm cluster

2024-07-08 Thread Adam Prycki
Hello, we are having issues with slow ops on our large ARM hpc ceph cluster. Cluster runs on 18.2.0 and ubutnu 20.04 MONs, MGRs and MDSs had to be moved to intel servers because of poor single core performance on our arm servers. Our main cephfs data pool is on 54 serwers in 9 racks with 1458 H

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-08 Thread Dhairya Parmar
he > filesystem crashed again where the log of the failure is here: > https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz. > We ran the backups on kernel mounts of the filesystem without the nowsync > option this time to avoid the out-of-sync wri

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-08 Thread Ivan Clayson
cluster reported no issues and data was accessible again. We re-started the backups to run over the weekend and unfortunately the filesystem crashed again where the log of the failure is here: https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.

[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2024-07-08 Thread Frédéric Nass
Hello, I just wanted to share that the following command also helped us move slow used bytes back to the fast device (without using bluefs-bdev-expand), when several compactions couldn't: $ cephadm shell --fsid $cid --name osd.${osd} -- ceph-bluestore-tool bluefs-bdev-migrate --path /var/lib/c

[ceph-users] AssumeRoleWithWebIdentity in RGW with Azure AD

2024-07-08 Thread Ryan Rempel
I'm trying to setup the OIDC provider for RGW so that I can have roles that can be assumed by people logging into their regular Azure AD identities. The client I'm planning to use is Cyberduck – it seems like one of the few GUI S3 clients that manages the OIDC login process in a way that could w

[ceph-users] Re: AssumeRoleWithWebIdentity in RGW with Azure AD

2024-07-08 Thread Pritha Srivastava
Hi Ryan, This appears to be a known issue and is tracked here: https://tracker.ceph.com/issues/54562. There is a workaround mentioned in the tracker that has worked and you can try that. Otherwise, I will be working on this 'invalid padding' problem very soon. Thanks, Pritha On Tue, Jul 9, 2024

[ceph-users] Re: Phantom hosts

2024-07-08 Thread Eugen Block
Hi Tim, is this still an issue? If it is, I recommend to add some more details so it's easier to follow your train of thought. ceph osd tree ceph -s ceph health detail ceph orch host ls And then please point out which host you're trying to get rid of. I would deal with the rgw thing later.