On Tue, Jul 9, 2024 at 12:41 PM Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> wrote: > > Hi Casey, > > 1. > Regarding versioning, the user doesn't use verisoning it if I'm not mistaken: > https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt > > 2. > Regarding multiparts, if it would have multipart thrash, it would be listed > here: > https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt > as a rgw.multimeta under the usage, right? > > 3. > Regarding the multisite idea, this bucket has been a multisite bucket last > year but we had to reshard (accepting to loose the replica on the 2nd site > and just keep it in the master site) and that time as expected it has > disappeared completely from the 2nd site (I guess the 40TB thrash still there > but can't really find it how to clean 🙁 ). Now it is a single site bucket. > Also it is the index pool, multisite logs should go to the rgw.log pool > shouldn't it?
some replication logs are in the log pool, but the per-object logs are stored in the bucket index objects. you can inspect these with `radosgw-admin bilog list --bucket=X`. by default, that will only list --max-entries=1000. you can add --shard-id=Y to look at specific 'large omap' objects even if your single-site bucket doesn't exist on the secondary zone, changes on the primary zone are probably still generating these bilog entries. you would need to do something like `radosgw-admin bucket sync disable --bucket=X` to make it stop. because you don't expect these changes to replicate, it's safe to delete any of this bucket's bilog entries with `radosgw-admin bilog trim --end-marker 9 --bucket=X`. depending on ceph version, you may need to run this trim command in a loop until the `bilog list` output is empty radosgw does eventually trim bilogs in the background after they're processed, but the secondary zone isn't processing them in this case > > Thank you > > > ________________________________ > From: Casey Bodley <cbod...@redhat.com> > Sent: Tuesday, July 9, 2024 10:39 PM > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > Cc: Eugen Block <ebl...@nde.ag>; ceph-users@ceph.io <ceph-users@ceph.io> > Subject: Re: [ceph-users] Re: Large omap in index pool even if properly > sharded and not "OVER" > > Email received from the internet. If in doubt, don't click any link nor open > any attachment ! > ________________________________ > > in general, these omap entries should be evenly spread over the > bucket's index shard objects. but there are two features that may > cause entries to clump on a single shard: > > 1. for versioned buckets, multiple versions of the same object name > map to the same index shard. this can become an issue if an > application is repeatedly overwriting an object without cleaning up > old versions. lifecycle rules can help to manage these noncurrent > versions > > 2. during a multipart upload, all of the parts are tracked on the same > index shard as the final object name. if applications are leaving a > lot of incomplete multipart uploads behind (especially if they target > the same object name) this can lead to similar clumping. the S3 api > has operations to list and abort incomplete multipart uploads, along > with lifecycle rules to automate their cleanup > > separately, multisite clusters use these same index shards to store > replication logs. if sync gets far enough behind, these log entries > can also lead to large omap warnings > > On Tue, Jul 9, 2024 at 10:25 AM Szabo, Istvan (Agoda) > <istvan.sz...@agoda.com> wrote: > > > > It's the same bucket: > > https://gist.github.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d > > > > > > ________________________________ > > From: Eugen Block <ebl...@nde.ag> > > Sent: Tuesday, July 9, 2024 8:03 PM > > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > > Cc: ceph-users@ceph.io <ceph-users@ceph.io> > > Subject: Re: [ceph-users] Re: Large omap in index pool even if properly > > sharded and not "OVER" > > > > Email received from the internet. If in doubt, don't click any link nor > > open any attachment ! > > ________________________________ > > > > Are those three different buckets? Could you share the stats for each of > > them? > > > > radosgw-admin bucket stats --bucket=<BUCKET> > > > > Zitat von "Szabo, Istvan (Agoda)" <istvan.sz...@agoda.com>: > > > > > Hello, > > > > > > Yeah, still: > > > > > > the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l > > > 290005 > > > > > > and the > > > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l > > > 289378 > > > > > > And just make me happy more I have one more > > > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l > > > 181588 > > > > > > This is my crush tree (I'm using host based crush rule) > > > https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt > > > > > > I'm thinking could that be the issue that host 2s13-15 has less nvme > > > osd (however size wise same as in the other 12 host where have 8x > > > nvme osd) than the others? > > > But the pgs are located like this: > > > > > > pg26.427 > > > osd.261 host8 > > > osd.488 host13 > > > osd.276 host4 > > > > > > pg26.606 > > > osd.443 host12 > > > osd.197 host8 > > > osd.524 host14 > > > > > > pg26.78c > > > osd.89 host7 > > > osd.406 host11 > > > osd.254 host6 > > > > > > If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution > > > based on host is the issue, however this pg is not located on any of > > > the 4x nvme osd nodes 😕 > > > > > > Ty > > > > > > ________________________________ > > > From: Eugen Block <ebl...@nde.ag> > > > Sent: Tuesday, July 9, 2024 6:02 PM > > > To: ceph-users@ceph.io <ceph-users@ceph.io> > > > Subject: [ceph-users] Re: Large omap in index pool even if properly > > > sharded and not "OVER" > > > > > > Email received from the internet. If in doubt, don't click any link > > > nor open any attachment ! > > > ________________________________ > > > > > > Hi, > > > > > > the number of shards looks fine, maybe this was just a temporary > > > burst? Did you check if the rados objects in the index pool still have > > > more than 200k omap objects? I would try someting like > > > > > > rados -p <index_pool> listomapkeys > > > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l > > > > > > > > > Zitat von "Szabo, Istvan (Agoda)" <istvan.sz...@agoda.com>: > > > > > >> Hi, > > >> > > >> I have a pretty big bucket which sharded with 1999 shard so in > > >> theory can hold close to 200m objects (199.900.000). > > >> Currently it has 54m objects. > > >> > > >> Bucket limit check looks also good: > > >> "bucket": ""xyz, > > >> "tenant": "", > > >> "num_objects": 53619489, > > >> "num_shards": 1999, > > >> "objects_per_shard": 26823, > > >> "fill_status": "OK" > > >> > > >> This is the bucket id: > > >> "id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1" > > >> > > >> This is the log lines: > > >> 2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster > > >> [WRN] Large omap object found. Object: > > >> 26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head > > >> PG: 26.3a67cc27 (26.427) Key count: 236919 Size > > >> (bytes): > > >> 89969920 > > >> > > >> 2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN] > > >> Large omap object found. Object: > > >> 26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head > > >> PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size > > >> (bytes): > > >> 95560458 > > >> > > >> Tried to deep scrub the affected pgs, tried to deep-scrub the > > >> mentioned osds in the log but didn't help. > > >> Why? What I'm missing? > > >> > > >> Thank you in advance for your help. > > >> > > >> ________________________________ > > >> This message is confidential and is for the sole use of the intended > > >> recipient(s). It may also be privileged or otherwise protected by > > >> copyright or other legal rules. If you have received it by mistake > > >> please let us know by reply email and delete it from your system. It > > >> is prohibited to copy this message or disclose its content to > > >> anyone. Any confidentiality or privilege is not waived or lost by > > >> any mistaken delivery or unauthorized disclosure of the message. All > > >> messages sent to and from Agoda may be monitored to ensure > > >> compliance with company policies, to protect the company's interests > > >> and to remove potential malware. Electronic messages may be > > >> intercepted, amended, lost or deleted, or contain viruses. > > >> _______________________________________________ > > >> ceph-users mailing list -- ceph-users@ceph.io > > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > ________________________________ > > > This message is confidential and is for the sole use of the intended > > > recipient(s). It may also be privileged or otherwise protected by > > > copyright or other legal rules. If you have received it by mistake > > > please let us know by reply email and delete it from your system. It > > > is prohibited to copy this message or disclose its content to > > > anyone. Any confidentiality or privilege is not waived or lost by > > > any mistaken delivery or unauthorized disclosure of the message. All > > > messages sent to and from Agoda may be monitored to ensure > > > compliance with company policies, to protect the company's interests > > > and to remove potential malware. Electronic messages may be > > > intercepted, amended, lost or deleted, or contain viruses. > > > > > > > > > > ________________________________ > > This message is confidential and is for the sole use of the intended > > recipient(s). It may also be privileged or otherwise protected by copyright > > or other legal rules. If you have received it by mistake please let us know > > by reply email and delete it from your system. It is prohibited to copy > > this message or disclose its content to anyone. Any confidentiality or > > privilege is not waived or lost by any mistaken delivery or unauthorized > > disclosure of the message. All messages sent to and from Agoda may be > > monitored to ensure compliance with company policies, to protect the > > company's interests and to remove potential malware. Electronic messages > > may be intercepted, amended, lost or deleted, or contain viruses. > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io