[ceph-users] Re: Slow ops during index pool recovery causes cluster performance drop to 1%

2024-11-14 Thread Szabo, Istvan (Agoda)
That is just one user, cluster has in peak time 1.1M read IOPS with 10GiB/s read throughput on 27-30 gws and around 20-50k write iops with 1.5-2GiB/s write throughput. I'll give a try to increase index pool pg with aiming to 400pg/nvme. Istvan From: Frédéric Nas

[ceph-users] Re: multisite sync issue with bucket sync

2024-11-14 Thread Christopher Durham
Hi, I have heard nothing on this, but have done some more research. Again, both sides of a multisite s3 configuration are ceph 18.2.4 on Rocky 9. For a given bucket, there are thousands of 'missing' objects. I did: radosgw-admin bucket sync init --bucket --src-zone sync starts after I restart a r

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Anthony D'Atri
You might also first try ceph osd down 1701 This marks the OSD down in the map, it doesn’t restart anything, but it does serve in some cases to goose progress. The OSD will quickly mark itself back up. Where 1701 is the ID of said primary. ceph health detail

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-14 Thread Frank Schilder
These mirrors will sync very soon and delete the tree as well. This needs to be fixed on the ceph repo side. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Ben Zieglmeier Sent: Thursday, November 14, 2024 1:50 P

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
I redid what I did before.  I changed the osd class and had a look at the storage and found a VM image!  I deleted it (it was a copy) and now the storage is empty.  I guess the image (assigned to a non-existent storage after reverting left pg's that could not be moved.  I'm reverting now again

[ceph-users] Re: Slow ops during index pool recovery causes cluster performance drop to 1%

2024-11-14 Thread Frédéric Nass
How many RGW gateways? With 300 update requests per second, I would start by increasing the number of shards. Frédéric. - Le 14 Nov 24, à 13:33, Istvan Szabo, Agoda a écrit : > This bucket receives 300 post/put/delete a sec. > I'll take a look at that, thank you. > 37x4/nvme, however y

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Frédéric Nass
Hi Roland, Yes, you can. See mclock documentation here [1]. One think I can think of is that these 113 PGs may have a common misbehaving OSD (primary or not) with a ridiculous osd_mclock_max_capacity_iops_ssd value set. Restarting the primary and/or adjusting osd_mclock_max_capacity_iops_ssd v

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-14 Thread Ben Zieglmeier
I was able to get what I needed from http://mirrors.gigenet.com/ceph/ (one of the mirrors listed in the Ceph doco). On Thu, Nov 14, 2024, 6:05 AM Frank Schilder wrote: > Hi all, > > +1 from me > > this is a really bad issue. We need access to these packages very soon. > Please restore this folde

[ceph-users] Re: Slow ops during index pool recovery causes cluster performance drop to 1%

2024-11-14 Thread Szabo, Istvan (Agoda)
This bucket receives 300 post/put/delete a sec. I'll take a look at that, thank you. 37x4/nvme, however yes, I think we need to increase for now. Thank you. From: Frédéric Nass Sent: Thursday, November 14, 2024 5:50 PM To: Szabo, Istvan (Agoda) Cc: Ceph Users Su

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-14 Thread Frank Schilder
Hi all, +1 from me this is a really bad issue. We need access to these packages very soon. Please restore this folder. In the meantime, is there a mirror somewhere? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 Fro

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
On 2024/11/14 11:44, Joachim Kraftmayer wrote: I know the similar behaviour when mclock is active. For osd.0 I see: osd.0 basic osd_mclock_max_capacity_iops_ssd 14305.161403 I'm unfamiliar with mclock.  Can one tune that to improve the situation? Roland Joachim joachim.kra

[ceph-users] Re: Slow ops during index pool recovery causes cluster performance drop to 1%

2024-11-14 Thread Frédéric Nass
I don't know how many pools you have in your cluster but ~37 PGs per OSD seems quite low, especially with NVMes. You could try increasing the number of PGs on this pool and maybe the data pool also. I don't know how many iops this bucket receives but the fact that index is spread over only 11 r

[ceph-users] Re: Slow ops during index pool recovery causes cluster performance drop to 1%

2024-11-14 Thread Szabo, Istvan (Agoda)
156x NVME osd Sharding I do like 10 objects/1 shard. Default 11 but they don't have 1.1m objects. This is the tree: https://gist.github.com/Badb0yBadb0y/835a45f8e82ddfcbbd82cf28126da728 From: Frédéric Nass Sent: Thursday, November 14, 2024 4:28 PM To: Szab

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Joachim Kraftmayer
I know the similar behaviour when mclock is active. Joachim joachim.kraftma...@clyso.com www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Roland Giesler schrieb am Do., 14. Nov. 2024, 05:40: > On 2024/11/13 21:05, Anthony D'

[ceph-users] Re: Slow ops during index pool recovery causes cluster performance drop to 1%

2024-11-14 Thread Frédéric Nass
Hi Istvan, > Only thing what I have in my mind to increase the replica size from 3 to 5 so > it could tollerate more osd slowness with size 5 min_size 2. I wouldn't do that, it will only get worse as every write IO will have to wait for 2 mores OSDs to ACK and the slow ops you've seen refer t

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Eugen Block
It's not clear to me if you wanted to add some more details after "I see this:" (twice). So you do see backfilling traffic if you out the OSD? Then maybe the remapped PGs are not even on that OSD? Have you checked 'ceph pg ls remapped'? To drain an OSD, you can either set it "out" as you alre

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
I had attached images, but these are not shown... On 2024/11/14 10:12, Roland Giesler wrote: On 2024/11/14 09:37, Eugen Block wrote: Remapped PGs is exactly what to expect after removing (or adding) a device class. Did you revert the change entirely? It sounds like you maybe forgot to add the

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
On 2024/11/14 09:37, Eugen Block wrote: Remapped PGs is exactly what to expect after removing (or adding) a device class. Did you revert the change entirely? It sounds like you maybe forgot to add the original device class back to the OSD where you changed it? Maybe share 'ceph osd tree'? Do yo