[ceph-users] Re: check bucket index on large bucket leads to laggy index PGs

Boris Tue, 13 May 2025 07:22:26 -0700

I've resharded the bucket index to 191, which worked out of the box.
The radosgw-admin bucket check --bucket BUCKET --fix --check-objects
command now worked and resulted in


"calculated_header": {
    "usage": {
        "rgw.none": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 15872659
        }
    }
}

We are in the process of deleting the bucket with radosgw-admin bucket rm
--bucket BUCKET --purge-objects

Running the radosgw-admin bucket check olh --bucket BUCKET --fix throws out
a lot of
2025-05-12T09:17:24.787+0000 73d0dcd3e980 -1 ERROR failed to update olh
for: OBJECT update_olh(): (125) Operation canceled
2025-05-12T09:17:24.788+0000 73d0dcd3e980  1 NOTICE: finished shard SHARDID
(0 entries removed)

So I guess this is already pretty bad.

For now I hope that the deletion goes through, but it seems to take a
looooooooooooong time (two hours to get the calculated num_objects from
16053307 to 15872659 ~= 180k, which results in a week until it is
finished).

Am Di., 13. Mai 2025 um 15:15 Uhr schrieb Enrico Bocchi <
enrico.boc...@cern.ch>:

> Hi Boris,
>
> We have experienced PGs going laggy in the past with a lower level rados
> command to list omapkeys from the BI objects in the metadata pool.
>
> 20M objects in 11 shards gives ~1.8M objects per shard, which is indeed
> a lot.
> If you are manually resharding a versioned bucket, consider that the
> number of objects reported by radosgw-admin bucket stats may not be
> accurate. Namely, it does not take into account that versioned objects
> produce 4 (iirc...) entries in the bucket index instead of 1 entry for
> non-versioned buckets. Also, be careful when deleting objects from a
> versioned or versioned-suspended bucket as you have to specify the
> versionID if you really want to get rid of the object. Else, the object
> logical head (olh) will point to a delete marker, but the object (and
> its entry in the index) will stay around, not cleaning up your BIs. More
> on this on the S3 protocol docs.
>
> Cheers,
> Enrico
>
>
> On 5/9/25 19:49, Boris wrote:
> >> resharding the bucket is indeed the solution. while resharding does
> >> have to read all of the keys from the source index objects, it doesn't
> >> read all of them at once. writing these keys to the target bucket
> >> index objects is the more expensive part, but those are different
> >> objects/pgs and should be better distributed
> > Ah great. Will try that.
> >
> > Ensure the index pool is really on only SSDs.   I’ve seen crush rules not
> >> specifying device class.
> >>
> > Yes they are. On dedicated NVMEs that we use for the meta pools (the
> > listing I've sent in slack).
> >
> > Do you have autoresharding disabled?  Versioned objects?  Can you do a
> >> bilog trim?  Could you preshard a new bucket and move the objects?
> >>
> > No autoresharding enabled. We wanted to test sharding in the multisite
> > setting before we enable it (and reshard all the buckets that need it in
> a
> > controlled way).
> > Yes, the bucket has versioned objects. We are in the process of deleting
> > it, but the deletion is now running since two days and the bucket index
> is
> > still very large.
> > I can try to do a bilog trim. Need to read up what it does and how to do
> it.
> > I could move the data to a new preshareded pool, but it feels like the
> > bucket is somehow broken, because deleting is now working as I expect it.
> >
> > I will try to reshard the bucket tonight and hope it will work out. The
> > explanation from Casey sounds promising.
> > As I have a lot more buckets with a lot more objects (according to the
> > bucket index) this needs to be done anyway.
> >
> > Cheers
> >   Boris
> >
> > Am Fr., 9. Mai 2025 um 19:21 Uhr schrieb Anthony D'Atri <
> > anthony.da...@gmail.com>:
> >
> >> Ensure the index pool is really on only SSDs.   I’ve seen crush rules
> not
> >> specifying device class.
> >>
> >> Do you have autoresharding disabled?  Versioned objects?  Can you do a
> >> bilog trim?  Could you preshard a new bucket and move the objects?
> >>
> >>> On May 9, 2025, at 12:54 PM, Boris <b...@kervyn.de> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a bucket that got >20m index entries but only got 11 shards.
> >>>
> >>> When I try to run a radosgw-admin bucket check the PGs that hold the
> >> index start to become laggy after a couple of seconds. I need to stop it
> >> because it kills the whole object storage.
> >>> This is a latest reef cluster and the master of a multisite which only
> >> replicates the metadata (1 realm, multiple zonegroups, one zone per
> >> zonegroup).
> >>> Any ideas what I can do?
> >>> I fear to reshard the bucket, because I am not sure if I can stop the
> >> resharding if the PGs become laggy.
> >>> Cheers
> >>> Boris
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> --
> Enrico Bocchi
> CERN European Laboratory for Particle Physics
> IT - Storage & Data Management  - General Storage Services
> Mailbox: G20500 - Office: 31-2-010
> 1211 Genève 23
> Switzerland
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: check bucket index on large bucket leads to laggy index PGs

Reply via email to