Hi Boris,

We have experienced PGs going laggy in the past with a lower level rados command to list omapkeys from the BI objects in the metadata pool.

20M objects in 11 shards gives ~1.8M objects per shard, which is indeed a lot. If you are manually resharding a versioned bucket, consider that the number of objects reported by radosgw-admin bucket stats may not be accurate. Namely, it does not take into account that versioned objects produce 4 (iirc...) entries in the bucket index instead of 1 entry for non-versioned buckets. Also, be careful when deleting objects from a versioned or versioned-suspended bucket as you have to specify the versionID if you really want to get rid of the object. Else, the object logical head (olh) will point to a delete marker, but the object (and its entry in the index) will stay around, not cleaning up your BIs. More on this on the S3 protocol docs.

Cheers,
Enrico


On 5/9/25 19:49, Boris wrote:
resharding the bucket is indeed the solution. while resharding does
have to read all of the keys from the source index objects, it doesn't
read all of them at once. writing these keys to the target bucket
index objects is the more expensive part, but those are different
objects/pgs and should be better distributed
Ah great. Will try that.

Ensure the index pool is really on only SSDs.   I’ve seen crush rules not
specifying device class.

Yes they are. On dedicated NVMEs that we use for the meta pools (the
listing I've sent in slack).

Do you have autoresharding disabled?  Versioned objects?  Can you do a
bilog trim?  Could you preshard a new bucket and move the objects?

No autoresharding enabled. We wanted to test sharding in the multisite
setting before we enable it (and reshard all the buckets that need it in a
controlled way).
Yes, the bucket has versioned objects. We are in the process of deleting
it, but the deletion is now running since two days and the bucket index is
still very large.
I can try to do a bilog trim. Need to read up what it does and how to do it.
I could move the data to a new preshareded pool, but it feels like the
bucket is somehow broken, because deleting is now working as I expect it.

I will try to reshard the bucket tonight and hope it will work out. The
explanation from Casey sounds promising.
As I have a lot more buckets with a lot more objects (according to the
bucket index) this needs to be done anyway.

Cheers
  Boris

Am Fr., 9. Mai 2025 um 19:21 Uhr schrieb Anthony D'Atri <
anthony.da...@gmail.com>:

Ensure the index pool is really on only SSDs.   I’ve seen crush rules not
specifying device class.

Do you have autoresharding disabled?  Versioned objects?  Can you do a
bilog trim?  Could you preshard a new bucket and move the objects?

On May 9, 2025, at 12:54 PM, Boris <b...@kervyn.de> wrote:

Hi,

I have a bucket that got >20m index entries but only got 11 shards.

When I try to run a radosgw-admin bucket check the PGs that hold the
index start to become laggy after a couple of seconds. I need to stop it
because it kills the whole object storage.
This is a latest reef cluster and the master of a multisite which only
replicates the metadata (1 realm, multiple zonegroups, one zone per
zonegroup).
Any ideas what I can do?
I fear to reshard the bucket, because I am not sure if I can stop the
resharding if the PGs become laggy.
Cheers
Boris
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage & Data Management  - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to