[ceph-users] Re: CRC Bad Signature when using KRBD

2024-09-06 Thread Ilya Dryomov
On Fri, Sep 6, 2024 at 3:54 AM wrote: > > Hello Ceph Users, > > * Problem: we get the following errors when using krbd, we are using rbd > for vms. > * Workaround: by switching to librbd the errors disappear. > > * Software: > ** Kernel: 6.8.8-2 (parameters: intel_iommu=on iommu=pt > pcie_aspm.pol

[ceph-users] Re: Somehow throotle recovery even further than basic options?

2024-09-06 Thread Anthony D'Atri
> This sounds interesting because this way the pressure wouldn't be too big if > go like 0.1 0.2 OSD by OSD. I used to do this as well, back before pg-upmap was a thing, and while I still had Jewel clients. It is however less efficient, because some data ends up moving more than once. Upweig

[ceph-users] Re: Somehow throotle recovery even further than basic options?

2024-09-06 Thread Szabo, Istvan (Agoda)
This sounds interesting because this way the pressure wouldn't be too big if go like 0.1 0.2 OSD by OSD. What I can see how ceph did it, when add the new OSDs, the complete host get the remapped pgs from other hosts also, so the old osds PG number increased by like +50% (which was already overlo

[ceph-users] Re: Somehow throotle recovery even further than basic options?

2024-09-06 Thread Eugen Block
I can’t say anything about the pgremapper, but have you tried increasing the crush weight gradually? Add new OSDs with crush initial weight 0 and then increase it in small steps. I haven’t used that approach for years, but maybe that can help here. Or are all OSDs already up and in? Or you

[ceph-users] Re: squid 19.2.0 QE validation status

2024-09-06 Thread Laura Flores
We have added one more core PR to the release, which was deemed a blocker ( https://github.com/ceph/ceph/pull/59492). Rados, upgrade, and smoke will need to be reapproved. On Wed, Sep 4, 2024 at 3:59 PM Laura Flores wrote: > Upgrade suites approved: > https://tracker.ceph.com/projects/rados/wiki

[ceph-users] Re: Somehow throotle recovery even further than basic options?

2024-09-06 Thread Szabo, Istvan (Agoda)
Forgot to paste, somehow I want to reduce this recovery operation: recovery: 0 B/s, 941.90k keys/s, 188 objects/s To 2-300Keys/sec From: Szabo, Istvan (Agoda) Sent: Friday, September 6, 2024 11:18 PM To: Ceph Users Subject: [ceph-users] Somehow throotle recover

[ceph-users] Somehow throotle recovery even further than basic options?

2024-09-06 Thread Szabo, Istvan (Agoda)
Hi, 4 years ago we've created our cluster with all disks 4osds (ssds and nvme disks) on octopus. The 15TB SSDs still working properly with 4 osds but the small 1.8T nvmes with the index pool not. Each new nvme osd adding to the existing nodes generates slow ops with scrub off, recovery_op_prior

[ceph-users] Re: Grafana dashboards is missing data

2024-09-06 Thread Sake Ceph
That is working, but I noticed the firewall isn't opened for that port. Shouldn't cephadm manage this, like it does for all the other ports? Kind regards, Sake > Op 06-09-2024 16:14 CEST schreef Björn Lässig : > > > Am Mittwoch, dem 04.09.2024 um 20:01 +0200 schrieb Sake Ceph: > > After th

[ceph-users] Re: Grafana dashboards is missing data

2024-09-06 Thread Björn Lässig
Am Mittwoch, dem 04.09.2024 um 20:01 +0200 schrieb Sake Ceph: > After the upgrade from 17.2.7 to 18.2.4 a lot of graphs are empty. For > example the Osd latency under OSD device details or the Osd Overview > has a lot of No data messages. > is the ceph-exporter listening on port 9926 (on every ho

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Tim Holloway
Thank you, Redouane! Some background. I migrated to Ceph amidst a Perfect Storm. The Ceph docs, as I've often complained, were/are a horrible mish-mash of deprecated instructions and more modern information. So, among other things, I ended up with a mess of resources, some legacy-based, some mana

[ceph-users] ceph-mgr perf throttle-msgr - what is caused fails?

2024-09-06 Thread Konstantin Shalygin
Hi, seems something in mgr is throttle due val > max. I'm right? root@mon1# ceph daemon /var/run/ceph/ceph-mgr.mon1.asok perf dump | jq '."throttle-msgr_dispatch_throttler-mgr-0x55930f4aed20"' { "val": 104856554, "max": 104857600, "get_started": 0, "get": 9700833, "get_sum": 6544522184

[ceph-users] Setting up Ceph RGW with SSE-S3 - Any examples?

2024-09-06 Thread Michael Worsham
Has anyone been successful at standing up a Ceph RGW S3 bucket with Hashicorp Vault for S3 bucket encryption? The documentation for doing it with Ceph jumps all over the page between token and agent, so it's nearly impossible to know which variables and parameters are required for each. I was a

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
On 06/09/2024 10:27, Matthew Vernon wrote: On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
Hi, On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by Ceph) is updated automatically to point

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Redouane Kachach
Hi Matthew, That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by Ceph) is updated automatically to point to the new active mgr. Unfortunately it's