[ceph-users] Re: Replace OSD while cluster is recovering?

2025-02-28 Thread Gustavo Garcia Rondina
Hi Frédéric, Thank you for the suggestion. I started `ceph pg repair {pgid}` inconsistent PGs but so far, no effect to be seen. Is it possible to monitor the progress of the repairs? With `ceph progress` I can't see it, and for some reason `ceph -w` is hanging. Kind regards, Gustavo _

[ceph-users] Re: Replace OSD while cluster is recovering?

2025-02-28 Thread Gustavo Garcia Rondina
Hi Laimis, Thank you for the suggestion. I issued ceph pg repair to all inconsistent PGs but so far nothing changed. Are deep scrubs even going to start with this much recovery in progress? We are currently using balanced as the osd_mclock_profile but I'm considering changing it to high_recove

[ceph-users] Re: Subject: Assistance Required: Vault Integration with RADOS Gateway for SSE-S3 Encryption

2025-02-28 Thread Dhivya G
Hi Arnaud,               Thanks for your support!              I am currently integrating Ceph RADOS Gateway (RGW) with HashiCorp Vault for SSE-S3 encryption and using js to upload objects to an encrypted bucket. I have configured the necessary parameters in my request, but I am encounteri

[ceph-users] [Cephfs] Can't get snapshot under a subvolume

2025-02-28 Thread ceph
Hello all, We're getting a "Operation not permitted" error while trying to create a snapshot on the client. It is somehow related to previously-asked Pacific issue mentioned here: https://www.spinics.net/lists/ceph-users/msg67908.html We are on squid (19.2.1) and the given workaround seems to b

[ceph-users] Re: Schrödinger's Server

2025-02-28 Thread Anthony D'Atri
> Thanks for the advice. > > Previously I was all HDDs, but I'm beginning to migrate to M.2 SSDs. > But so far, only a few. Manage your CRUSH device classes and rules carefully. Also, are you selecting *enterprise* NVMe M.2 SSDs? Many of them out there are client-class and/or SATA. Are you

[ceph-users] Replace OSD while cluster is recovering?

2025-02-28 Thread grondina
Hello list, We have a Ceph cluster (v17.2.6 quincy) with 3 admin nodes and 6 storage nodes, each connected to a JBOD enclosure. Each enclosure houses 28 HDD disks of size 18 TB, totaling 168 OSDs. The pool that houses the majority of the data is erasure-coded (4+2). We have recently had one dis

[ceph-users] Re: Replace OSD while cluster is recovering?

2025-02-28 Thread Laimis Juzeliūnas
Hi Gustavo, Focua on fixing the inconsistent pgs, either via deep scrub or with specifically telling the cluster to repair them. Once that is done you are good to go with more riskier operations. However if the OSD is already out of the cluster all the recovery operations for data are already und

[ceph-users] Replace OSD while cluster is recovering?

2025-02-28 Thread Gustavo Garcia Rondina
Hello list, We have a Ceph cluster (17.2.6 quincy) with 2 admin nodes and 6 storage nodes, each storage node connected to a JBOD enclosure. Each enclosure houses 28 HDD disks of 18 TB size, totaling 168 OSDs. The pool that houses the majority of the data is erasure-coded (4+2). We have recently

[ceph-users] Re: Schrödinger's Server

2025-02-28 Thread Tim Holloway
Thanks for the advice. Previously I was all HDDs, but I'm beginning to migrate to M.2 SSDs. But so far, only a few. I'll have to look regarding WPQ. I'm running whatever came out of the box, possibly inherited from the pre-upgrade installation (Octopus, RIP!) As far as kick-starting stalled reco

[ceph-users] Re: Request for Assistance: OSDS Stability Issues Post-Upgrade to Ceph Quincy 17.2.8

2025-02-28 Thread Eric Le Lay
Hi Aref, same issue here, upgrading from 17.2.7 to 17.2.8, the bug #69764 hit us, with OSDs randomly crashing. We did a rollback to 17.2.7, where we had occasional OSD lock-up for 30s (maybe #62815) but non-crashing OSDs. So we are now planning the upgrade to Reef that we would have done an

[ceph-users] Re: Squid: Grafana host-details shows total number of OSDs

2025-02-28 Thread Eugen Block
Thank you! Zitat von Ankush Behl : Hi Eugen, I checked the query[1], one of the recent patches has removed the host filter(as mentioned by you) from the query. Will send the patch and fix the issue. [1]. https://github.com/ceph/ceph/blame/89f21db321ddd404654b0064a61b0aa5428d6f7e/monitoring/c