I've resharded the bucket index to 191, which worked out of the box. The radosgw-admin bucket check --bucket BUCKET --fix --check-objects command now worked and resulted in
"calculated_header": { "usage": { "rgw.none": { "size": 0, "size_actual": 0, "size_utilized": 0, "size_kb": 0, "size_kb_actual": 0, "size_kb_utilized": 0, "num_objects": 15872659 } } } We are in the process of deleting the bucket with radosgw-admin bucket rm --bucket BUCKET --purge-objects Running the radosgw-admin bucket check olh --bucket BUCKET --fix throws out a lot of 2025-05-12T09:17:24.787+0000 73d0dcd3e980 -1 ERROR failed to update olh for: OBJECT update_olh(): (125) Operation canceled 2025-05-12T09:17:24.788+0000 73d0dcd3e980 1 NOTICE: finished shard SHARDID (0 entries removed) So I guess this is already pretty bad. For now I hope that the deletion goes through, but it seems to take a looooooooooooong time (two hours to get the calculated num_objects from 16053307 to 15872659 ~= 180k, which results in a week until it is finished). Am Di., 13. Mai 2025 um 15:15 Uhr schrieb Enrico Bocchi < enrico.boc...@cern.ch>: > Hi Boris, > > We have experienced PGs going laggy in the past with a lower level rados > command to list omapkeys from the BI objects in the metadata pool. > > 20M objects in 11 shards gives ~1.8M objects per shard, which is indeed > a lot. > If you are manually resharding a versioned bucket, consider that the > number of objects reported by radosgw-admin bucket stats may not be > accurate. Namely, it does not take into account that versioned objects > produce 4 (iirc...) entries in the bucket index instead of 1 entry for > non-versioned buckets. Also, be careful when deleting objects from a > versioned or versioned-suspended bucket as you have to specify the > versionID if you really want to get rid of the object. Else, the object > logical head (olh) will point to a delete marker, but the object (and > its entry in the index) will stay around, not cleaning up your BIs. More > on this on the S3 protocol docs. > > Cheers, > Enrico > > > On 5/9/25 19:49, Boris wrote: > >> resharding the bucket is indeed the solution. while resharding does > >> have to read all of the keys from the source index objects, it doesn't > >> read all of them at once. writing these keys to the target bucket > >> index objects is the more expensive part, but those are different > >> objects/pgs and should be better distributed > > Ah great. Will try that. > > > > Ensure the index pool is really on only SSDs. I’ve seen crush rules not > >> specifying device class. > >> > > Yes they are. On dedicated NVMEs that we use for the meta pools (the > > listing I've sent in slack). > > > > Do you have autoresharding disabled? Versioned objects? Can you do a > >> bilog trim? Could you preshard a new bucket and move the objects? > >> > > No autoresharding enabled. We wanted to test sharding in the multisite > > setting before we enable it (and reshard all the buckets that need it in > a > > controlled way). > > Yes, the bucket has versioned objects. We are in the process of deleting > > it, but the deletion is now running since two days and the bucket index > is > > still very large. > > I can try to do a bilog trim. Need to read up what it does and how to do > it. > > I could move the data to a new preshareded pool, but it feels like the > > bucket is somehow broken, because deleting is now working as I expect it. > > > > I will try to reshard the bucket tonight and hope it will work out. The > > explanation from Casey sounds promising. > > As I have a lot more buckets with a lot more objects (according to the > > bucket index) this needs to be done anyway. > > > > Cheers > > Boris > > > > Am Fr., 9. Mai 2025 um 19:21 Uhr schrieb Anthony D'Atri < > > anthony.da...@gmail.com>: > > > >> Ensure the index pool is really on only SSDs. I’ve seen crush rules > not > >> specifying device class. > >> > >> Do you have autoresharding disabled? Versioned objects? Can you do a > >> bilog trim? Could you preshard a new bucket and move the objects? > >> > >>> On May 9, 2025, at 12:54 PM, Boris <b...@kervyn.de> wrote: > >>> > >>> Hi, > >>> > >>> I have a bucket that got >20m index entries but only got 11 shards. > >>> > >>> When I try to run a radosgw-admin bucket check the PGs that hold the > >> index start to become laggy after a couple of seconds. I need to stop it > >> because it kills the whole object storage. > >>> This is a latest reef cluster and the master of a multisite which only > >> replicates the metadata (1 realm, multiple zonegroups, one zone per > >> zonegroup). > >>> Any ideas what I can do? > >>> I fear to reshard the bucket, because I am not sure if I can stop the > >> resharding if the PGs become laggy. > >>> Cheers > >>> Boris > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > > -- > Enrico Bocchi > CERN European Laboratory for Particle Physics > IT - Storage & Data Management - General Storage Services > Mailbox: G20500 - Office: 31-2-010 > 1211 Genève 23 > Switzerland > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io