[ceph-users] Re: RadosGW: Even more large omap objects after resharding

Anthony D'Atri Thu, 19 Jun 2025 08:26:46 -0700

Reducing the reshard threshold when versioned objects are in play is something 
we’ve recommended to customers recently.  At such a time that the code is 
enhanced and makes it into releases then rolling back to a default value would 
of course make sense.  50000 is half the default I think, so it seems like a 
reasonable place to start, depending on the cardinality of and within buckets.


IIRC the threshold used to be ten times higher, but by the time such an omap 
was surfaced it tended to be too late to address without a fair amount of 
impact, so the threshold was lowered.

> On Jun 19, 2025, at 9:57 AM, Niklaus Hofer <niklaus.ho...@stepping-stone.ch> 
> wrote:
> 
> Dear Anthony
> 
>> Later releases do improve dynamic resharding FWIW.
> 
> We are hoping to upgrade to Reef at the end of the summer. But until then we 
> would really love a solution. The huge omap object we've had before caused a 
> lot of headaches (slow ops, hung OSDs, PGs that never completed backfilling).
> 
>> Also, do you have buckets using versioned objects?  If so I would suggest 
>> lowering rgw_max_objs_per_shard to, say, 50000.
> 
> Yes, we do indeed have versioning (and object locking, too) active on these 
> buckets.
> 
> Decreasing rgw_max_objs_per_shard goes along with what I was thinking, too. 
> Glad to hear others are thinking alike.
> 
> I think I saw an object map with 1.3M object references, so I guess 50'000 
> might still be too high. But we'll probably do 50'000 anyway at first and see 
> whether it helps at all.
> 
> I'll definitely let you know how it's going!
> 
> Sincerely
> 
> Niklaus Hofer
> 
> On 19/06/2025 15.26, Anthony D'Atri wrote:
>> Later releases do improve dynamic resharding FWIW.
>> Also, do you have buckets using versioned objects?  If so I would suggest 
>> lowering rgw_max_objs_per_shard to, say, 50000. Upcoming releases will 
>> improve that dynamic.
>>> On Jun 19, 2025, at 9:07 AM, Niklaus Hofer 
>>> <niklaus.ho...@stepping-stone.ch> wrote:
>>> 
>>> Dear Eugen
>>> 
>>> Thank you for your input. What you say is true, of course. I've read that 
>>> some place too. I already ran a deep-scrub on all the involved PGs and 
>>> that's when the number of warnings increased drastically. I assume that 
>>> before I just had one positively huge omap object for that pool. Now I have 
>>> 167 omap objects that are not quite as big, but still too large.
>>> 
>>> Sincerely
>>> 
>>> Niklaus Hofer
>>> 
>>> On 19/06/2025 14.48, Eugen Block wrote:
>>>> Hi,
>>>> the warnings about large omap objects are reported when deep-scrubs 
>>>> happen. So if you resharded the bucket (or Ceph did that for you), you'll 
>>>> either have to wait for the deep-scrub schedule to scrub the affected PGs, 
>>>> or you issue a manual deep-scrub on that PG or the entire pool.
>>>> Regards,
>>>> Eugen
>>>> Zitat von Niklaus Hofer <niklaus.ho...@stepping-stone.ch>:
>>>>> Dear all
>>>>> 
>>>>> We are running Ceph Pacific (16.2.15) with RadosGW and have been getting 
>>>>> "large omap objects" health warnings on the RadosGW index pool. Indeed we 
>>>>> had one bucket in particular that was positively huge with 8'127'198 
>>>>> objects that had just a single shard. But we have been seeing the message 
>>>>> on some other buckets, too.
>>>>> 
>>>>> Eventually, we activated automatic resharding (rgw_dynamic_resharding = 
>>>>> true) and indeed this bucket was resharded to now 167 shards. However, I 
>>>>> am now getting even more large omap object warnings. On that same bucket, 
>>>>> too. The other buckets have not been resharded at all. They are not in 
>>>>> the queue, either:
>>>>> 
>>>>> | radosgw-admin reshard list
>>>>>> []
>>>>> 
>>>>> | grep 'Large omap object found' /var/log/ceph/ceph* | grep 'PG: ' | cut 
>>>>> -d: -f 10 | cut -d. -f 4-5 | sort | uniq -c
>>>>>       2 7936686773.215
>>>>>      12 7937604172.149
>>>>>      10 7955243979.1209
>>>>>       9 7955243979.2480
>>>>>      13 7955243979.2481
>>>>>      12 7968198782.110
>>>>>      13 7968913553.67
>>>>>      11 7968913553.68
>>>>>      10 7968913553.69
>>>>>      11 7981210604.1
>>>>>      74 7981624399.1
>>>>>     217 7988881492.1
>>>>> 
>>>>> 
>>>>> | radosgw-admin metadata list --metadata-key bucket.instance | grep -i 
>>>>> 7988881492
>>>>>>    "<bucket1_name>:<pool_name>.7988881492.1",
>>>>> 
>>>>> | radosgw-admin bucket stats --bucket <bucket1_name>
>>>>>> {
>>>>>>    "bucket": "<bucket1_name>",
>>>>>>    "num_shards": 167,
>>>>>> [...]
>>>>>>    "usage": {
>>>>>>        "rgw.main": {
>>>>>>            "size": 9669928611955,
>>>>>>            "size_actual": 9692804734976,
>>>>>>            "size_utilized": 9669928611955,
>>>>>>            "size_kb": 9443289661,
>>>>>>            "size_kb_actual": 9465629624,
>>>>>>            "size_kb_utilized": 9443289661,
>>>>>>            "num_objects": 8134437
>>>>>>        }
>>>>>>    },
>>>>> 
>>>>> Let's check another one, too:
>>>>> 
>>>>> | radosgw-admin metadata list --metadata-key bucket.instance | grep -i 
>>>>> 7968198782.110
>>>>>>    "<bucket2_name>:<pool_name>.7968198782.110",
>>>>> 
>>>>> | radosgw-admin bucket stats --bucket <bucket2_name>
>>>>>> [...]
>>>>>>            "num_objects": 38690
>>>>>> [...]
>>>>> 
>>>>> According to the documentation in [0], buckets are resharded at a 
>>>>> threshold of 100'000 objects per shard. For both of these, that applies 
>>>>> nicely, so it makes sense that they are not getting resharded any further.
>>>>> 
>>>>> But why then am I getting these warnings?
>>>>> 
>>>>> Reading the documentation in [1], I can see that warnings are printed at 
>>>>> 200'000 entries per omap object. Can I assume that one object in an RGW 
>>>>> bucket means 1 entry in an omap object? Or is that a missconception?
>>>>> 
>>>>> Now, here is my working theory. Please let me know if that has any merit 
>>>>> or if I'm completely off:
>>>>> 
>>>>> The affected buckets have versioning activated. Plus object locking too. 
>>>>> They get used by a backup software (Kopia) that uses these features to 
>>>>> provide ransomware protection. So my thinking is that maybe with 
>>>>> versioning active, each object in a bucket could result in multiple omap 
>>>>> entries, maybe one per version or something?
>>>>> 
>>>>> If that is the case, then maybe I should reduce `rgw_max_objs_per_shard` 
>>>>> from 100'000 to something like 10'000 to have the buckets resharded more 
>>>>> aggressively?
>>>>> 
>>>>> But then again, that assumes a lot. For example, that assumes that the 
>>>>> num_objects counter in the bucket stats does not count up on versioned 
>>>>> objects. So my assumption could be completely whack.
>>>>> 
>>>>> What do you think? What can I do to get rid of the large omap objects? Is 
>>>>> more resharding going to help? What else could I check?
>>>>> 
>>>>> Sincerely
>>>>> 
>>>>> Niklaus Hofer
>>>>> 
>>>>> Links:
>>>>> [0] https://docs.ceph.com/en/latest/radosgw/dynamicresharding/ 
>>>>> #confval-rgw_max_objs_per_shard
>>>>> [1] https://docs.ceph.com/en/latest/rados/operations/health-checks/ 
>>>>> #large-omap-objects
>>>>> -- 
>>>>> stepping stone AG
>>>>> Wasserwerkgasse 7
>>>>> CH-3011 Bern
>>>>> 
>>>>> Telefon: +41 31 332 53 63
>>>>> www.stepping-stone.ch
>>>>> niklaus.ho...@stepping-stone.ch
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>>> -- 
>>> Managed Kubernetes (Container as a Service) kostenlos testen!
>>> 
>>> Jetzt bestellen und 500 Franken Startguthaben sichern:
>>> https://www.stoney-cloud.com/caas/
>>> 
>>> Arbeitszeiten: Dienstag bis Freitag
>>> 
>>> stepping stone AG
>>> Wasserwerkgasse 7
>>> CH-3011 Bern
>>> 
>>> Telefon: +41 31 332 53 63
>>> www.stepping-stone.ch
>>> niklaus.ho...@stepping-stone.ch
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> -- 
> Managed Kubernetes (Container as a Service) kostenlos testen!
> 
> Jetzt bestellen und 500 Franken Startguthaben sichern:
> https://www.stoney-cloud.com/caas/
> 
> Arbeitszeiten: Dienstag bis Freitag
> 
> stepping stone AG
> Wasserwerkgasse 7
> CH-3011 Bern
> 
> Telefon: +41 31 332 53 63
> www.stepping-stone.ch
> niklaus.ho...@stepping-stone.ch
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RadosGW: Even more large omap objects after resharding

Reply via email to