[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

Frédéric Nass Sat, 05 Apr 2025 10:47:41 -0700

Hi Danish, 

The "unable to find head object data pool for..." could be an incorrect warning 
since it pops out for 'most of the objects'. [1]


Regarding the only object named 'cursor.png' that fails to sync, one thing you 
could try (since you can't delete it with an s3 client) is to rewrite it with 
an s3 client (copy) and then retry the delete. 
If it fails with an s3 client, you could try with 'radosgw-admin object put' 
and/or 'radosgw-admin object rm'. 

It that still fails, then here's what you can do to (at least deal with the 
bucket synchronization issue) to remove this object from the index: 

1/ Set bucket_name, index_pool_name, and bucket_id (jq command is required) 

$ bucket_name="bucket-test" 
$ index_pool_name=".rgw.sme.index" 
$ bucket_id=$(radosgw-admin bucket stats --bucket=${bucket_name} | jq -r .id) 

2/ Retrieve all index shards along with their omap keys 

$ mkdir "$bucket_id" 
$ for i in $(rados -p $index_pool_name ls | grep "$bucket_id"); do echo $i ; 
rados -p $index_pool_name listomapkeys $i > "${bucket_id}/${i}" ; done 

3/ Identify in which shard the 'cursor.png' object listed (be sure to identify 
the right object. You may have several WP using the same image...) 

$ grep 'cursor.png' ${bucket_id}/.dir.${bucket_id}* | sed -e 
"s/^${bucket_id}\///g" > remove_from_index.txt 

4/ Make sure the content of remove_from_index.txt file only has one line 
corresponding to the object you want to remove from the index : 

$ cat remove_from_index.txt 
.dir.0f448533-3c6c-4cb8-bde9-c9763ac17751.738183.1.6:48/wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png
 

5/ Remove the object from the index shard 

while IFS=':' read -r object key ; do echo "Removing Key ${key}" ; rados -p 
${index_pool_name} rmomapkey "${object}" "${key}" ; done < 
remove_from_index.txt 

Restart both RGWs and check the sync state again. 

Next, you might want to check for inconsistencies between the index and the 
actual data. You could use the rgw-orphan-list script for this [2]. And of 
course, upgrade your cluster. 

Hope this helps, 

Regards, 
Frédéric. 

[1] [ https://bugzilla.redhat.com/show_bug.cgi?id=2126787 | 
https://bugzilla.redhat.com/show_bug.cgi?id=2126787 ] 
[2] [ 
https://www.ibm.com/docs/en/storage-ceph/8.0?topic=management-finding-orphan-leaky-objects
 | 
https://www.ibm.com/docs/en/storage-ceph/8.0?topic=management-finding-orphan-leaky-objects
 ] 

----- Le 26 Mar 25, à 6:12, Danish Khan <danish52....@gmail.com> a écrit : 



Dear Frédéric, 

Unfortunately, I am still using Octopus version and these commands are showing 
unrecognized. 

Versioning is also not enabled on the bucket. 
I tried running : 
radosgw-admin bucket check --bucket=<bucket> --fix 

which run for few minutes giving lot of output, which contained below lines for 
most of the objects: 
WARNING: unable to find head object data pool for 
"<bucket>:wp-content/uploads/sites/74/2025/03/mutation-house-no-629.pdf", not 
updating version pool/epoch 

Is this issue fixable in octopus or should I plan to upgrade ceph cluster till 
Quincy version? 

Regards, 
Danish 


On Wed, Mar 26, 2025 at 2:41 AM Frédéric Nass < [ 
mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr ] > 
wrote: 

BQ_BEGIN

Hi Danish, 

Can you specify the version of Ceph used and whether versioning is enabled on 
this bucket? 

There are 2 ways to clean up orphan entries in a bucket index that I'm aware of 
: 

- One (the preferable way) is to rely on radosgw-admin command to check and 
hopefully fix the issue, cleaning up the index from orphan entries or even 
rebuilding the index entirely if necessary. 

There's been new radosgw-admin commands coded recently [1] to cleanup leftover 
OLH index entries and unlinked instance objects within versioned buckets. 

If this bucket is versioned, I would advise you try and run the new check / fix 
commands mentioned in this [2] release note : 
radosgw-admin bucket check unlinked [--fix] 

radosgw-admin bucket check olh [--fix] 
- Another one (as a second chance) is to act at the rados layer, identifying in 
which shard the orphan index entry is listed (listomapkeys) and remove it from 
the specified shard (rmomapkey). I could elaborate on that later if needed. 

Regards, 
Frédéric. 

[1] [ https://tracker.ceph.com/issues/62075 | 
https://tracker.ceph.com/issues/62075 ] 
[2] [ https://ceph.io/en/news/blog/2023/v18-2-1-reef-released/ | 
https://ceph.io/en/news/blog/2023/v18-2-1-reef-released/ ] 




De : Danish Khan < [ mailto:danish52....@gmail.com | danish52....@gmail.com ] > 
Envoyé : mardi 25 mars 2025 17:16 
À : Frédéric Nass 
Cc: ceph-users 
Objet : Re: [ceph-users] ceph-ansible LARGE OMAP in RGW pool 

Hi Frédéric, 

Thank you for replying. 

I followed the steps mentioned in [ https://tracker.ceph.com/issues/62845 | 
https://tracker.ceph.com/issues/62845 ] and was able to trim all the errors. 

Everything seemed to be working fine until the same error appeared again. 

I am still assuming the main culprit of this issue is one missing object and 
all the errors are showing this object only. 

I am able to list this object using s3cmd tool but I am unable to perform any 
action on this object, I am even unable to delete it, overwrite it or get it. 

I tried stopping all RGWs one by one and even tried after stopping all the RGWS 
but recovery is still not getting completed. 

And the LARGE OMAP is now only increasing. 

Is there a way I can delete it from index or from ceph end directly from pool 
so that it don't try to recover it? 

Regards, 
Danish 



On Tue, Mar 25, 2025 at 11:29 AM Frédéric Nass < [ 
mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr ] > 
wrote: 

BQ_BEGIN

Hi Danish, 

While reviewing the backports for upcoming v18.2.5, I came across this [1]. 
Could be your issue. 

Can you try the suggested workaround (--marker=9) and report back? 

Regards, 
Frédéric. 

[1] [ https://tracker.ceph.com/issues/62845 | 
https://tracker.ceph.com/issues/62845 ] 


De : Danish Khan < [ mailto:danish52....@gmail.com | danish52....@gmail.com ] > 
Envoyé : vendredi 14 mars 2025 23:11 
À : Frédéric Nass 
Cc: ceph-users 
Objet : Re: [ceph-users] ceph-ansible LARGE OMAP in RGW pool 

Dear Frédéric, 

1/ Identify the shards with the most sync errors log entries: 

I have identified the shard which is causing the issue is shard 31, but almost 
all the error shows only one object of a bucket. And the object exists in the 
master zone. but I'm not sure why the replication site is unable to sync it. 

2/ For each shard, list every sync error log entry along with their ids: 

radosgw-admin sync error list --shard-id=X 

The output of this command shows same shard and same objects mostly (shard 31 
and object 
/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png) 

3/ Remove them **except the last one** with: 

radosgw-admin sync error trim --shard-id=X --marker=1_1682101321.201434_8669.1 
Trimming did remove a few entries from the error log. But still there are many 
error logs for the same object which I am unable to trim. 

Now the trim command is executing successfully but not doing anything. 

I am still getting error about the object which is not syncing in radosgw log: 

2025-03-15T03:05:48.060+0530 7fee2affd700 0 
RGW-SYNC:data:sync:shard[80]:entry[mbackup:70134e66-872072ee2d32.2205852207.1:48]:bucket_sync_sources[target=:[]):source_bucket=:[]):source_zone=872072ee2d32]:bucket[mbackup:70134e66-872072ee2d32.2205852207.1:48<-mod-backup:70134e66-872072ee2d32.2205852207.1:48]:full_sync[mod-backup:70134e66-872072ee2d32.2205852207.1:48]:entry[wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png]:
 ERROR: failed to sync object: 
mbackup:70134e66-872072ee2d32.2205852207.1:48/wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png
 

I am getting this error from appox two months, And if I remember correctly, we 
are getting LARGE OMAP warning from then only. 

I will try to delete this object from the Master zone on Monday and will see if 
this fixes the issue. 

Do you have any other suggestions on this, which I should consider? 

Regards, 
Danish 






On Thu, Mar 13, 2025 at 6:07 PM Frédéric Nass < [ 
mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr ] > 
wrote: 

BQ_BEGIN
Hi Danish, 

Can you access this KB article [1]? A free developer account should allow you 
to. 

It pretty much describes what you're facing and suggests to trim the sync error 
log of recovering shards. Actually, every log entry **except the last one**. 

1/ Identify the shards with the most sync errors log entries: 

radosgw-admin sync error list --max-entries=1000000 | grep shard_id | sort -n | 
uniq -c | sort -h 

2/ For each shard, list every sync error log entry along with their ids: 

radosgw-admin sync error list --shard-id=X 

3/ Remove them **except the last one** with: 

radosgw-admin sync error trim --shard-id=X --marker=1_1682101321.201434_8669.1 

the --marker above being the log entry id. 

Are the replication threads running on the same RGWs that S3 clients are using? 

If so, using dedicated RGWs for the sync job might help you avoid 
non-recovering shards in the future, as described in Matthew's post [2] 

Regards, 
Frédéric. 

[1] [ https://access.redhat.com/solutions/7023912 | 
https://access.redhat.com/solutions/7023912 ] 
[2] [ https://www.spinics.net/lists/ceph-users/msg83988.html | 
https://www.spinics.net/lists/ceph-users/msg83988.html ] 

----- Le 12 Mar 25, à 11:15, Danish Khan [ mailto:danish52....@gmail.com | 
danish52....@gmail.com ] a écrit : 

> Dear All, 
> 
> My ceph cluster is giving Large OMAP warning from approx 2-3 Months. I 
> tried a few things like : 
> *Deep scrub of PGs* 
> *Compact OSDs* 
> *Trim log* 
> But these didn't work out. 
> 
> I guess the main issue is that 4 shards in replication site are always 
> recovering from 2-3 months. 
> 
> Any suggestions are highly appreciated. 
> 
> Sync status: 
> root@drhost1:~# radosgw-admin sync status 
> realm e259e0a92 (object-storage) 
> zonegroup 7a8606d2 (staas) 
> zone c8022ad1 (repstaas) 
> metadata sync syncing 
> full sync: 0/64 shards 
> incremental sync: 64/64 shards 
> metadata is caught up with master 
> data sync source: 2072ee2d32 (masterstaas) 
> syncing 
> full sync: 0/128 shards 
> incremental sync: 128/128 shards 
> data is behind on 3 shards 
> behind shards: [7,90,100] 
> oldest incremental change not applied: 
> 2025-03-12T13:14:10.268469+0530 [7] 
> 4 shards are recovering 
> recovering shards: [31,41,55,80] 
> 
> 
> Master site: 
> 1. *root@master1:~# for obj in $(rados ls -p masterstaas.rgw.log); do echo 
> "$(rados listomapkeys -p masterstaas.rgw.log $obj | wc -l) $obj";done | 
> sort -nr | head -10* 
> 1225387 data_log.91 
> 1225065 data_log.86 
> 1224662 data_log.87 
> 1224448 data_log.92 
> 1224018 data_log.89 
> 1222156 data_log.93 
> 1201489 data_log.83 
> 1174125 data_log.90 
> 363498 data_log.84 
> 258709 data_log.6 
> 
> 
> 2. *root@master1:~# for obj in data_log.91 data_log.86 data_log.87 
> data_log.92 data_log.89 data_log.93 data_log.83 data_log.90; do rados stat 
> -p masterstaas.rgw.log $obj; done* 
> masterstaas.rgw.log/data_log.91 mtime 2025-02-24T15:09:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.86 mtime 2025-02-24T15:01:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.87 mtime 2025-02-24T15:02:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.92 mtime 2025-02-24T15:11:01.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.89 mtime 2025-02-24T14:54:55.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.93 mtime 2025-02-24T14:53:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.83 mtime 2025-02-24T14:16:21.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.90 mtime 2025-02-24T15:05:25.000000+0530, size 
> 0 
> 
> *3. ceph cluster log :* 
> 2025-02-22T04:18:27.324886+0530 osd.173 (osd.173) 19 : cluster [WRN] Large 
> omap object found. Object: 124:b2ddf551:::data_log.93:head PG: 124.8aafbb4d 
> (124.d) Key count: 1218170 Size (bytes): 297085860 
> 2025-02-22T04:18:28.735886+0530 osd.65 (osd.65) 308 : cluster [WRN] Large 
> omap object found. Object: 124:f2081d70:::data_log.92:head PG: 124.eb8104f 
> (124.f) Key count: 1220420 Size (bytes): 295240028 
> 2025-02-22T04:18:30.668884+0530 mon.master1 (mon.0) 7974038 : cluster [WRN] 
> Health check update: 3 large omap objects (LARGE_OMAP_OBJECTS) 
> 2025-02-22T04:18:31.127585+0530 osd.18 (osd.18) 224 : cluster [WRN] Large 
> omap object found. Object: 124:d1061236:::data_log.86:head PG: 124.6c48608b 
> (124.b) Key count: 1221047 Size (bytes): 295398274 
> 2025-02-22T04:18:33.189561+0530 osd.37 (osd.37) 32665 : cluster [WRN] Large 
> omap object found. Object: 124:9a2e04b7:::data_log.87:head PG: 124.ed207459 
> (124.19) Key count: 1220584 Size (bytes): 295290366 
> 2025-02-22T04:18:35.007117+0530 osd.77 (osd.77) 135 : cluster [WRN] Large 
> omap object found. Object: 124:6b9e929a:::data_log.89:head PG: 124.594979d6 
> (124.16) Key count: 1219913 Size (bytes): 295127816 
> 2025-02-22T04:18:36.189141+0530 mon.master1 (mon.0) 7974039 : cluster [WRN] 
> Health check update: 5 large omap objects (LARGE_OMAP_OBJECTS) 
> 2025-02-22T04:18:36.340247+0530 osd.112 (osd.112) 259 : cluster [WRN] Large 
> omap object found. Object: 124:0958bece:::data_log.83:head PG: 124.737d1a90 
> (124.10) Key count: 1200406 Size (bytes): 290280292 
> 2025-02-22T04:18:38.523766+0530 osd.73 (osd.73) 1064 : cluster [WRN] Large 
> omap object found. Object: 124:fddd971f:::data_log.91:head PG: 124.f8e9bbbf 
> (124.3f) Key count: 1221183 Size (bytes): 295425320 
> 2025-02-22T04:18:42.619926+0530 osd.92 (osd.92) 285 : cluster [WRN] Large 
> omap object found. Object: 124:7dc404fa:::data_log.90:head PG: 124.5f2023be 
> (124.3e) Key count: 1169895 Size (bytes): 283025576 
> 2025-02-22T04:18:44.242655+0530 mon.master1 (mon.0) 7974043 : cluster [WRN] 
> Health check update: 8 large omap objects (LARGE_OMAP_OBJECTS) 
> 
> Replica site: 
> 1. *for obj in $(rados ls -p repstaas.rgw.log); do echo "$(rados 
> listomapkeys -p repstaas.rgw.log $obj | wc -l) $obj";done | sort -nr | head 
> -10* 
> 
> 432850 data_log.91 
> 432384 data_log.87 
> 432323 data_log.93 
> 431783 data_log.86 
> 431510 data_log.92 
> 427959 data_log.89 
> 414522 data_log.90 
> 407571 data_log.83 
> 151015 data_log.84 
> 109790 data_log.4 
> 
> 
> 2. *ceph cluster log:* 
> grep -ir "Large omap object found" /var/log/ceph/ 
> /var/log/ceph/ceph-mon.drhost1.log:2025-03-12T14:49:59.997+0530 
> 7fc4ad544700 0 log_channel(cluster) log [WRN] : Search the cluster log 
> for 'Large omap object found' for more details. 
> /var/log/ceph/ceph.log:2025-03-12T14:49:02.078108+0530 osd.10 (osd.10) 21 : 
> cluster [WRN] Large omap object found. Object: 
> 6:b2ddf551:::data_log.93:head PG: 6.8aafbb4d (6.d) Key count: 432323 Size 
> (bytes): 105505884 
> /var/log/ceph/ceph.log:2025-03-12T14:49:02.389288+0530 osd.48 (osd.48) 37 : 
> cluster [WRN] Large omap object found. Object: 
> 6:d1061236:::data_log.86:head PG: 6.6c48608b (6.b) Key count: 431782 Size 
> (bytes): 104564674 
> /var/log/ceph/ceph.log:2025-03-12T14:49:07.166954+0530 osd.24 (osd.24) 24 : 
> cluster [WRN] Large omap object found. Object: 
> 6:0958bece:::data_log.83:head PG: 6.737d1a90 (6.10) Key count: 407571 Size 
> (bytes): 98635522 
> /var/log/ceph/ceph.log:2025-03-12T14:49:09.100110+0530 osd.63 (osd.63) 5 : 
> cluster [WRN] Large omap object found. Object: 
> 6:9a2e04b7:::data_log.87:head PG: 6.ed207459 (6.19) Key count: 432384 Size 
> (bytes): 104712350 
> /var/log/ceph/ceph.log:2025-03-12T14:49:08.703760+0530 osd.59 (osd.59) 11 : 
> cluster [WRN] Large omap object found. Object: 
> 6:6b9e929a:::data_log.89:head PG: 6.594979d6 (6.16) Key count: 427959 Size 
> (bytes): 103773777 
> /var/log/ceph/ceph.log:2025-03-12T14:49:11.126132+0530 osd.40 (osd.40) 24 : 
> cluster [WRN] Large omap object found. Object: 
> 6:f2081d70:::data_log.92:head PG: 6.eb8104f (6.f) Key count: 431508 Size 
> (bytes): 104520406 
> /var/log/ceph/ceph.log:2025-03-12T14:49:13.799473+0530 osd.43 (osd.43) 61 : 
> cluster [WRN] Large omap object found. Object: 
> 6:fddd971f:::data_log.91:head PG: 6.f8e9bbbf (6.1f) Key count: 432850 Size 
> (bytes): 104418869 
> /var/log/ceph/ceph.log:2025-03-12T14:49:14.398480+0530 osd.3 (osd.3) 55 : 
> cluster [WRN] Large omap object found. Object: 
> 6:7dc404fa:::data_log.90:head PG: 6.5f2023be (6.1e) Key count: 414521 Size 
> (bytes): 100396561 
> /var/log/ceph/ceph.log:2025-03-12T14:50:00.000484+0530 mon.drhost1 (mon.0) 
> 207423 : cluster [WRN] Search the cluster log for 'Large omap object 
> found' for more details. 
> 
> Regards, 
> Danish 
> _______________________________________________ 
> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ] 
> To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io | 
> ceph-users-le...@ceph.io ] 




BQ_END


De : Danish Khan < [ mailto:danish52....@gmail.com | danish52....@gmail.com ] > 
Envoyé : mardi 25 mars 2025 17:16 
À : Frédéric Nass 
Cc: ceph-users 
Objet : Re: [ceph-users] ceph-ansible LARGE OMAP in RGW pool 


Hi Frédéric, 

Thank you for replying. 

I followed the steps mentioned in [ https://tracker.ceph.com/issues/62845 | 
https://tracker.ceph.com/issues/62845 ] and was able to trim all the errors. 

Everything seemed to be working fine until the same error appeared again. 

I am still assuming the main culprit of this issue is one missing object and 
all the errors are showing this object only. 

I am able to list this object using s3cmd tool but I am unable to perform any 
action on this object, I am even unable to delete it, overwrite it or get it. 

I tried stopping all RGWs one by one and even tried after stopping all the RGWS 
but recovery is still not getting completed. 

And the LARGE OMAP is now only increasing. 

Is there a way I can delete it from index or from ceph end directly from pool 
so that it don't try to recover it? 

Regards, 
Danish 



On Tue, Mar 25, 2025 at 11:29 AM Frédéric Nass < [ 
mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr ] > 
wrote: 

BQ_BEGIN

Hi Danish, 

While reviewing the backports for upcoming v18.2.5, I came across this [1]. 
Could be your issue. 

Can you try the suggested workaround (--marker=9) and report back? 

Regards, 
Frédéric. 

[1] [ https://tracker.ceph.com/issues/62845 | 
https://tracker.ceph.com/issues/62845 ] 


De : Danish Khan < [ mailto:danish52....@gmail.com | danish52....@gmail.com ] > 
Envoyé : vendredi 14 mars 2025 23:11 
À : Frédéric Nass 
Cc: ceph-users 
Objet : Re: [ceph-users] ceph-ansible LARGE OMAP in RGW pool 


Dear Frédéric, 

1/ Identify the shards with the most sync errors log entries: 

I have identified the shard which is causing the issue is shard 31, but almost 
all the error shows only one object of a bucket. And the object exists in the 
master zone. but I'm not sure why the replication site is unable to sync it. 

2/ For each shard, list every sync error log entry along with their ids: 

radosgw-admin sync error list --shard-id=X 

The output of this command shows same shard and same objects mostly (shard 31 
and object 
/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png) 

3/ Remove them **except the last one** with: 

radosgw-admin sync error trim --shard-id=X --marker=1_1682101321.201434_8669.1 
Trimming did remove a few entries from the error log. But still there are many 
error logs for the same object which I am unable to trim. 

Now the trim command is executing successfully but not doing anything. 

I am still getting error about the object which is not syncing in radosgw log: 

2025-03-15T03:05:48.060+0530 7fee2affd700 0 
RGW-SYNC:data:sync:shard[80]:entry[mbackup:70134e66-872072ee2d32.2205852207.1:48]:bucket_sync_sources[target=:[]):source_bucket=:[]):source_zone=872072ee2d32]:bucket[mbackup:70134e66-872072ee2d32.2205852207.1:48<-mod-backup:70134e66-872072ee2d32.2205852207.1:48]:full_sync[mod-backup:70134e66-872072ee2d32.2205852207.1:48]:entry[wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png]:
 ERROR: failed to sync object: 
mbackup:70134e66-872072ee2d32.2205852207.1:48/wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png
 

I am getting this error from appox two months, And if I remember correctly, we 
are getting LARGE OMAP warning from then only. 

I will try to delete this object from the Master zone on Monday and will see if 
this fixes the issue. 

Do you have any other suggestions on this, which I should consider? 

Regards, 
Danish 






On Thu, Mar 13, 2025 at 6:07 PM Frédéric Nass < [ 
mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr ] > 
wrote: 

BQ_BEGIN
Hi Danish, 

Can you access this KB article [1]? A free developer account should allow you 
to. 

It pretty much describes what you're facing and suggests to trim the sync error 
log of recovering shards. Actually, every log entry **except the last one**. 

1/ Identify the shards with the most sync errors log entries: 

radosgw-admin sync error list --max-entries=1000000 | grep shard_id | sort -n | 
uniq -c | sort -h 

2/ For each shard, list every sync error log entry along with their ids: 

radosgw-admin sync error list --shard-id=X 

3/ Remove them **except the last one** with: 

radosgw-admin sync error trim --shard-id=X --marker=1_1682101321.201434_8669.1 

the --marker above being the log entry id. 

Are the replication threads running on the same RGWs that S3 clients are using? 

If so, using dedicated RGWs for the sync job might help you avoid 
non-recovering shards in the future, as described in Matthew's post [2] 

Regards, 
Frédéric. 

[1] [ https://access.redhat.com/solutions/7023912 | 
https://access.redhat.com/solutions/7023912 ] 
[2] [ https://www.spinics.net/lists/ceph-users/msg83988.html | 
https://www.spinics.net/lists/ceph-users/msg83988.html ] 

----- Le 12 Mar 25, à 11:15, Danish Khan [ mailto:danish52....@gmail.com | 
danish52....@gmail.com ] a écrit : 

> Dear All, 
> 
> My ceph cluster is giving Large OMAP warning from approx 2-3 Months. I 
> tried a few things like : 
> *Deep scrub of PGs* 
> *Compact OSDs* 
> *Trim log* 
> But these didn't work out. 
> 
> I guess the main issue is that 4 shards in replication site are always 
> recovering from 2-3 months. 
> 
> Any suggestions are highly appreciated. 
> 
> Sync status: 
> root@drhost1:~# radosgw-admin sync status 
> realm e259e0a92 (object-storage) 
> zonegroup 7a8606d2 (staas) 
> zone c8022ad1 (repstaas) 
> metadata sync syncing 
> full sync: 0/64 shards 
> incremental sync: 64/64 shards 
> metadata is caught up with master 
> data sync source: 2072ee2d32 (masterstaas) 
> syncing 
> full sync: 0/128 shards 
> incremental sync: 128/128 shards 
> data is behind on 3 shards 
> behind shards: [7,90,100] 
> oldest incremental change not applied: 
> 2025-03-12T13:14:10.268469+0530 [7] 
> 4 shards are recovering 
> recovering shards: [31,41,55,80] 
> 
> 
> Master site: 
> 1. *root@master1:~# for obj in $(rados ls -p masterstaas.rgw.log); do echo 
> "$(rados listomapkeys -p masterstaas.rgw.log $obj | wc -l) $obj";done | 
> sort -nr | head -10* 
> 1225387 data_log.91 
> 1225065 data_log.86 
> 1224662 data_log.87 
> 1224448 data_log.92 
> 1224018 data_log.89 
> 1222156 data_log.93 
> 1201489 data_log.83 
> 1174125 data_log.90 
> 363498 data_log.84 
> 258709 data_log.6 
> 
> 
> 2. *root@master1:~# for obj in data_log.91 data_log.86 data_log.87 
> data_log.92 data_log.89 data_log.93 data_log.83 data_log.90; do rados stat 
> -p masterstaas.rgw.log $obj; done* 
> masterstaas.rgw.log/data_log.91 mtime 2025-02-24T15:09:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.86 mtime 2025-02-24T15:01:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.87 mtime 2025-02-24T15:02:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.92 mtime 2025-02-24T15:11:01.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.89 mtime 2025-02-24T14:54:55.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.93 mtime 2025-02-24T14:53:25.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.83 mtime 2025-02-24T14:16:21.000000+0530, size 
> 0 
> masterstaas.rgw.log/data_log.90 mtime 2025-02-24T15:05:25.000000+0530, size 
> 0 
> 
> *3. ceph cluster log :* 
> 2025-02-22T04:18:27.324886+0530 osd.173 (osd.173) 19 : cluster [WRN] Large 
> omap object found. Object: 124:b2ddf551:::data_log.93:head PG: 124.8aafbb4d 
> (124.d) Key count: 1218170 Size (bytes): 297085860 
> 2025-02-22T04:18:28.735886+0530 osd.65 (osd.65) 308 : cluster [WRN] Large 
> omap object found. Object: 124:f2081d70:::data_log.92:head PG: 124.eb8104f 
> (124.f) Key count: 1220420 Size (bytes): 295240028 
> 2025-02-22T04:18:30.668884+0530 mon.master1 (mon.0) 7974038 : cluster [WRN] 
> Health check update: 3 large omap objects (LARGE_OMAP_OBJECTS) 
> 2025-02-22T04:18:31.127585+0530 osd.18 (osd.18) 224 : cluster [WRN] Large 
> omap object found. Object: 124:d1061236:::data_log.86:head PG: 124.6c48608b 
> (124.b) Key count: 1221047 Size (bytes): 295398274 
> 2025-02-22T04:18:33.189561+0530 osd.37 (osd.37) 32665 : cluster [WRN] Large 
> omap object found. Object: 124:9a2e04b7:::data_log.87:head PG: 124.ed207459 
> (124.19) Key count: 1220584 Size (bytes): 295290366 
> 2025-02-22T04:18:35.007117+0530 osd.77 (osd.77) 135 : cluster [WRN] Large 
> omap object found. Object: 124:6b9e929a:::data_log.89:head PG: 124.594979d6 
> (124.16) Key count: 1219913 Size (bytes): 295127816 
> 2025-02-22T04:18:36.189141+0530 mon.master1 (mon.0) 7974039 : cluster [WRN] 
> Health check update: 5 large omap objects (LARGE_OMAP_OBJECTS) 
> 2025-02-22T04:18:36.340247+0530 osd.112 (osd.112) 259 : cluster [WRN] Large 
> omap object found. Object: 124:0958bece:::data_log.83:head PG: 124.737d1a90 
> (124.10) Key count: 1200406 Size (bytes): 290280292 
> 2025-02-22T04:18:38.523766+0530 osd.73 (osd.73) 1064 : cluster [WRN] Large 
> omap object found. Object: 124:fddd971f:::data_log.91:head PG: 124.f8e9bbbf 
> (124.3f) Key count: 1221183 Size (bytes): 295425320 
> 2025-02-22T04:18:42.619926+0530 osd.92 (osd.92) 285 : cluster [WRN] Large 
> omap object found. Object: 124:7dc404fa:::data_log.90:head PG: 124.5f2023be 
> (124.3e) Key count: 1169895 Size (bytes): 283025576 
> 2025-02-22T04:18:44.242655+0530 mon.master1 (mon.0) 7974043 : cluster [WRN] 
> Health check update: 8 large omap objects (LARGE_OMAP_OBJECTS) 
> 
> Replica site: 
> 1. *for obj in $(rados ls -p repstaas.rgw.log); do echo "$(rados 
> listomapkeys -p repstaas.rgw.log $obj | wc -l) $obj";done | sort -nr | head 
> -10* 
> 
> 432850 data_log.91 
> 432384 data_log.87 
> 432323 data_log.93 
> 431783 data_log.86 
> 431510 data_log.92 
> 427959 data_log.89 
> 414522 data_log.90 
> 407571 data_log.83 
> 151015 data_log.84 
> 109790 data_log.4 
> 
> 
> 2. *ceph cluster log:* 
> grep -ir "Large omap object found" /var/log/ceph/ 
> /var/log/ceph/ceph-mon.drhost1.log:2025-03-12T14:49:59.997+0530 
> 7fc4ad544700 0 log_channel(cluster) log [WRN] : Search the cluster log 
> for 'Large omap object found' for more details. 
> /var/log/ceph/ceph.log:2025-03-12T14:49:02.078108+0530 osd.10 (osd.10) 21 : 
> cluster [WRN] Large omap object found. Object: 
> 6:b2ddf551:::data_log.93:head PG: 6.8aafbb4d (6.d) Key count: 432323 Size 
> (bytes): 105505884 
> /var/log/ceph/ceph.log:2025-03-12T14:49:02.389288+0530 osd.48 (osd.48) 37 : 
> cluster [WRN] Large omap object found. Object: 
> 6:d1061236:::data_log.86:head PG: 6.6c48608b (6.b) Key count: 431782 Size 
> (bytes): 104564674 
> /var/log/ceph/ceph.log:2025-03-12T14:49:07.166954+0530 osd.24 (osd.24) 24 : 
> cluster [WRN] Large omap object found. Object: 
> 6:0958bece:::data_log.83:head PG: 6.737d1a90 (6.10) Key count: 407571 Size 
> (bytes): 98635522 
> /var/log/ceph/ceph.log:2025-03-12T14:49:09.100110+0530 osd.63 (osd.63) 5 : 
> cluster [WRN] Large omap object found. Object: 
> 6:9a2e04b7:::data_log.87:head PG: 6.ed207459 (6.19) Key count: 432384 Size 
> (bytes): 104712350 
> /var/log/ceph/ceph.log:2025-03-12T14:49:08.703760+0530 osd.59 (osd.59) 11 : 
> cluster [WRN] Large omap object found. Object: 
> 6:6b9e929a:::data_log.89:head PG: 6.594979d6 (6.16) Key count: 427959 Size 
> (bytes): 103773777 
> /var/log/ceph/ceph.log:2025-03-12T14:49:11.126132+0530 osd.40 (osd.40) 24 : 
> cluster [WRN] Large omap object found. Object: 
> 6:f2081d70:::data_log.92:head PG: 6.eb8104f (6.f) Key count: 431508 Size 
> (bytes): 104520406 
> /var/log/ceph/ceph.log:2025-03-12T14:49:13.799473+0530 osd.43 (osd.43) 61 : 
> cluster [WRN] Large omap object found. Object: 
> 6:fddd971f:::data_log.91:head PG: 6.f8e9bbbf (6.1f) Key count: 432850 Size 
> (bytes): 104418869 
> /var/log/ceph/ceph.log:2025-03-12T14:49:14.398480+0530 osd.3 (osd.3) 55 : 
> cluster [WRN] Large omap object found. Object: 
> 6:7dc404fa:::data_log.90:head PG: 6.5f2023be (6.1e) Key count: 414521 Size 
> (bytes): 100396561 
> /var/log/ceph/ceph.log:2025-03-12T14:50:00.000484+0530 mon.drhost1 (mon.0) 
> 207423 : cluster [WRN] Search the cluster log for 'Large omap object 
> found' for more details. 
> 
> Regards, 
> Danish 
> _______________________________________________ 
> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ] 
> To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io | 
> ceph-users-le...@ceph.io ] 

BQ_END


BQ_END


BQ_END


BQ_END


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

Reply via email to