[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

Danish Khan Tue, 25 Mar 2025 09:17:34 -0700

Hi Frédéric,

Thank you for replying.


I followed the steps mentioned in https://tracker.ceph.com/issues/62845 and
was able to trim all the errors.

Everything seemed to be working fine until the same error appeared again.

I am still assuming the main culprit of this issue is one missing
object and all the errors are showing this object only.

I am able to list this object using s3cmd tool but I am unable to perform
any action on this object, I am even unable to delete it, overwrite it or
get it.

I tried stopping all RGWs one by one and even tried after stopping all the
RGWS but recovery is still not getting completed.

And the LARGE OMAP is now only increasing.

Is there a way I can delete it from index or from ceph end directly from
pool so that it don't try to recover it?

Regards,
Danish



On Tue, Mar 25, 2025 at 11:29 AM Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

> Hi Danish,
>
> While reviewing the backports for upcoming v18.2.5, I came across this
> [1]. Could be your issue.
>
> Can you try the suggested workaround (--marker=9) and report back?
>
> Regards,
> Frédéric.
>
> [1] https://tracker.ceph.com/issues/62845
>
> ------------------------------
> *De :* Danish Khan <danish52....@gmail.com>
> *Envoyé :* vendredi 14 mars 2025 23:11
> *À :* Frédéric Nass
> *Cc:* ceph-users
> *Objet :* Re: [ceph-users] ceph-ansible LARGE OMAP in RGW pool
>
>
> Dear Frédéric,
>
> 1/ Identify the shards with the most sync errors log entries:
>
> I have identified the shard which is causing the issue is shard 31, but
> almost all the error shows only one object of a bucket. And the object
> exists in the master zone. but I'm not sure why the replication site is
> unable to sync it.
>
> 2/ For each shard, list every sync error log entry along with their ids:
>
> radosgw-admin sync error list --shard-id=X
>
> The output of this command shows same shard and same objects mostly (shard
> 31 and object
> /plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png)
>
> 3/ Remove them **except the last one** with:
>
> radosgw-admin sync error trim --shard-id=X
> --marker=1_1682101321.201434_8669.1
> Trimming did remove a few entries from the error log. But still there are
> many error logs for the same object which I am unable to trim.
>
> Now the trim command is executing successfully but not doing anything.
>
> I am still getting error about the object which is not syncing in radosgw
> log:
>
> 2025-03-15T03:05:48.060+0530 7fee2affd700  0
> RGW-SYNC:data:sync:shard[80]:entry[mbackup:70134e66-872072ee2d32.2205852207.1:48]:bucket_sync_sources[target=:[]):source_bucket=:[]):source_zone=872072ee2d32]:bucket[mbackup:70134e66-872072ee2d32.2205852207.1:48<-mod-backup:70134e66-872072ee2d32.2205852207.1:48]:full_sync[mod-backup:70134e66-872072ee2d32.2205852207.1:48]:entry[wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png]:
> ERROR: failed to sync object:
> mbackup:70134e66-872072ee2d32.2205852207.1:48/wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png
>
> I am getting this error from appox two months, And if I remember
> correctly, we are getting LARGE OMAP warning from then only.
>
> I will try to delete this object from the Master zone on Monday and will
> see if this fixes the issue.
>
> Do you have any other suggestions on this, which I should consider?
>
> Regards,
> Danish
>
>
>
>
>
>
> On Thu, Mar 13, 2025 at 6:07 PM Frédéric Nass <
> frederic.n...@univ-lorraine.fr> wrote:
>
>> Hi Danish,
>>
>> Can you access this KB article [1]? A free developer account should allow
>> you to.
>>
>> It pretty much describes what you're facing and suggests to trim the sync
>> error log of recovering shards. Actually, every log entry **except the last
>> one**.
>>
>> 1/ Identify the shards with the most sync errors log entries:
>>
>> radosgw-admin sync error list --max-entries=1000000 | grep shard_id |
>> sort -n | uniq -c | sort -h
>>
>> 2/ For each shard, list every sync error log entry along with their ids:
>>
>> radosgw-admin sync error list --shard-id=X
>>
>> 3/ Remove them **except the last one** with:
>>
>> radosgw-admin sync error trim --shard-id=X
>> --marker=1_1682101321.201434_8669.1
>>
>> the --marker above being the log entry id.
>>
>> Are the replication threads running on the same RGWs that S3 clients are
>> using?
>>
>> If so, using dedicated RGWs for the sync job might help you avoid
>> non-recovering shards in the future, as described in Matthew's post [2]
>>
>> Regards,
>> Frédéric.
>>
>> [1] https://access.redhat.com/solutions/7023912
>> [2] https://www.spinics.net/lists/ceph-users/msg83988.html
>>
>> ----- Le 12 Mar 25, à 11:15, Danish Khan danish52....@gmail.com a écrit :
>>
>> > Dear All,
>> >
>> > My ceph cluster is giving Large OMAP warning from approx 2-3 Months. I
>> > tried a few things like :
>> > *Deep scrub of PGs*
>> > *Compact OSDs*
>> > *Trim log*
>> > But these didn't work out.
>> >
>> > I guess the main issue is that 4 shards in replication site are always
>> > recovering from 2-3 months.
>> >
>> > Any suggestions are highly appreciated.
>> >
>> > Sync status:
>> > root@drhost1:~# radosgw-admin sync status
>> >          realm e259e0a92 (object-storage)
>> >      zonegroup 7a8606d2 (staas)
>> >           zone c8022ad1 (repstaas)
>> >  metadata sync syncing
>> >                full sync: 0/64 shards
>> >                incremental sync: 64/64 shards
>> >                metadata is caught up with master
>> >      data sync source: 2072ee2d32 (masterstaas)
>> >                        syncing
>> >                        full sync: 0/128 shards
>> >                        incremental sync: 128/128 shards
>> >                        data is behind on 3 shards
>> >                        behind shards: [7,90,100]
>> >                        oldest incremental change not applied:
>> > 2025-03-12T13:14:10.268469+0530 [7]
>> >                        4 shards are recovering
>> >                        recovering shards: [31,41,55,80]
>> >
>> >
>> > Master site:
>> > 1. *root@master1:~# for obj in $(rados ls -p masterstaas.rgw.log); do
>> echo
>> > "$(rados listomapkeys -p masterstaas.rgw.log $obj | wc -l) $obj";done |
>> > sort -nr | head -10*
>> > 1225387 data_log.91
>> > 1225065 data_log.86
>> > 1224662 data_log.87
>> > 1224448 data_log.92
>> > 1224018 data_log.89
>> > 1222156 data_log.93
>> > 1201489 data_log.83
>> > 1174125 data_log.90
>> > 363498 data_log.84
>> > 258709 data_log.6
>> >
>> >
>> > 2. *root@master1:~# for obj in data_log.91 data_log.86 data_log.87
>> > data_log.92 data_log.89 data_log.93 data_log.83 data_log.90; do rados
>> stat
>> > -p masterstaas.rgw.log $obj; done*
>> > masterstaas.rgw.log/data_log.91 mtime 2025-02-24T15:09:25.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.86 mtime 2025-02-24T15:01:25.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.87 mtime 2025-02-24T15:02:25.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.92 mtime 2025-02-24T15:11:01.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.89 mtime 2025-02-24T14:54:55.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.93 mtime 2025-02-24T14:53:25.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.83 mtime 2025-02-24T14:16:21.000000+0530,
>> size
>> > 0
>> > masterstaas.rgw.log/data_log.90 mtime 2025-02-24T15:05:25.000000+0530,
>> size
>> > 0
>> >
>> > *3. ceph cluster log :*
>> > 2025-02-22T04:18:27.324886+0530 osd.173 (osd.173) 19 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:b2ddf551:::data_log.93:head PG:
>> 124.8aafbb4d
>> > (124.d) Key count: 1218170 Size (bytes): 297085860
>> > 2025-02-22T04:18:28.735886+0530 osd.65 (osd.65) 308 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:f2081d70:::data_log.92:head PG:
>> 124.eb8104f
>> > (124.f) Key count: 1220420 Size (bytes): 295240028
>> > 2025-02-22T04:18:30.668884+0530 mon.master1 (mon.0) 7974038 : cluster
>> [WRN]
>> > Health check update: 3 large omap objects (LARGE_OMAP_OBJECTS)
>> > 2025-02-22T04:18:31.127585+0530 osd.18 (osd.18) 224 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:d1061236:::data_log.86:head PG:
>> 124.6c48608b
>> > (124.b) Key count: 1221047 Size (bytes): 295398274
>> > 2025-02-22T04:18:33.189561+0530 osd.37 (osd.37) 32665 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:9a2e04b7:::data_log.87:head PG:
>> 124.ed207459
>> > (124.19) Key count: 1220584 Size (bytes): 295290366
>> > 2025-02-22T04:18:35.007117+0530 osd.77 (osd.77) 135 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:6b9e929a:::data_log.89:head PG:
>> 124.594979d6
>> > (124.16) Key count: 1219913 Size (bytes): 295127816
>> > 2025-02-22T04:18:36.189141+0530 mon.master1 (mon.0) 7974039 : cluster
>> [WRN]
>> > Health check update: 5 large omap objects (LARGE_OMAP_OBJECTS)
>> > 2025-02-22T04:18:36.340247+0530 osd.112 (osd.112) 259 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:0958bece:::data_log.83:head PG:
>> 124.737d1a90
>> > (124.10) Key count: 1200406 Size (bytes): 290280292
>> > 2025-02-22T04:18:38.523766+0530 osd.73 (osd.73) 1064 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:fddd971f:::data_log.91:head PG:
>> 124.f8e9bbbf
>> > (124.3f) Key count: 1221183 Size (bytes): 295425320
>> > 2025-02-22T04:18:42.619926+0530 osd.92 (osd.92) 285 : cluster [WRN]
>> Large
>> > omap object found. Object: 124:7dc404fa:::data_log.90:head PG:
>> 124.5f2023be
>> > (124.3e) Key count: 1169895 Size (bytes): 283025576
>> > 2025-02-22T04:18:44.242655+0530 mon.master1 (mon.0) 7974043 : cluster
>> [WRN]
>> > Health check update: 8 large omap objects (LARGE_OMAP_OBJECTS)
>> >
>> > Replica site:
>> > 1. *for obj in $(rados ls -p repstaas.rgw.log); do echo "$(rados
>> > listomapkeys -p repstaas.rgw.log $obj | wc -l) $obj";done | sort -nr |
>> head
>> > -10*
>> >
>> > 432850 data_log.91
>> > 432384 data_log.87
>> > 432323 data_log.93
>> > 431783 data_log.86
>> > 431510 data_log.92
>> > 427959 data_log.89
>> > 414522 data_log.90
>> > 407571 data_log.83
>> > 151015 data_log.84
>> > 109790 data_log.4
>> >
>> >
>> > 2. *ceph cluster log:*
>> > grep -ir "Large omap object found" /var/log/ceph/
>> > /var/log/ceph/ceph-mon.drhost1.log:2025-03-12T14:49:59.997+0530
>> > 7fc4ad544700  0 log_channel(cluster) log [WRN] :     Search the cluster
>> log
>> > for 'Large omap object found' for more details.
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:02.078108+0530 osd.10 (osd.10)
>> 21 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:b2ddf551:::data_log.93:head PG: 6.8aafbb4d (6.d) Key count: 432323
>> Size
>> > (bytes): 105505884
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:02.389288+0530 osd.48 (osd.48)
>> 37 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:d1061236:::data_log.86:head PG: 6.6c48608b (6.b) Key count: 431782
>> Size
>> > (bytes): 104564674
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:07.166954+0530 osd.24 (osd.24)
>> 24 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:0958bece:::data_log.83:head PG: 6.737d1a90 (6.10) Key count: 407571
>> Size
>> > (bytes): 98635522
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:09.100110+0530 osd.63 (osd.63)
>> 5 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:9a2e04b7:::data_log.87:head PG: 6.ed207459 (6.19) Key count: 432384
>> Size
>> > (bytes): 104712350
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:08.703760+0530 osd.59 (osd.59)
>> 11 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:6b9e929a:::data_log.89:head PG: 6.594979d6 (6.16) Key count: 427959
>> Size
>> > (bytes): 103773777
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:11.126132+0530 osd.40 (osd.40)
>> 24 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:f2081d70:::data_log.92:head PG: 6.eb8104f (6.f) Key count: 431508 Size
>> > (bytes): 104520406
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:13.799473+0530 osd.43 (osd.43)
>> 61 :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:fddd971f:::data_log.91:head PG: 6.f8e9bbbf (6.1f) Key count: 432850
>> Size
>> > (bytes): 104418869
>> > /var/log/ceph/ceph.log:2025-03-12T14:49:14.398480+0530 osd.3 (osd.3) 55
>> :
>> > cluster [WRN] Large omap object found. Object:
>> > 6:7dc404fa:::data_log.90:head PG: 6.5f2023be (6.1e) Key count: 414521
>> Size
>> > (bytes): 100396561
>> > /var/log/ceph/ceph.log:2025-03-12T14:50:00.000484+0530 mon.drhost1
>> (mon.0)
>> > 207423 : cluster [WRN]     Search the cluster log for 'Large omap object
>> > found' for more details.
>> >
>> > Regards,
>> > Danish
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

Reply via email to