[ceph-users] Snaptrim flood

Lukasz Gomulka Thu, 30 Oct 2025 05:45:05 -0700

Dear all,

In the previous year, we were forced to migrate all data from one cluster due 
to a critical issue with snaptrimq length, which caused the cluster to become 
unstable. Unfortunately, we are now experiencing the same problem again.
We've been flooded with a massive queue of snaptrims despite a massive lack of 
snapshot deletion. Within seconds, every PG received over 2 million snaps to 
delete. It looks like some kind of database corruption.


1.1 SNAPTRIMQ_LEN 2264067

Last Year it was Octopus cluster (15.2.17). Now it is cluster in Quincy 
(17.2.7). Originally cluster was created in version 14.2.16 and upgraded -> 
15.2.17 -> 17.2.7.

Below I attached background/history in chronological order from earliest. I 
have also attach logs from snaptrimming osd in text file.

----------
BACKGROUND:

All was good:
> 2025-10-16T21:30:09.776167+0000 mgr.mgr-01-<cluster-fsid> (mgr.8232603159) 
> 47659 : cluster [DBG] pgmap v22191: 8193 pgs: 2 active+clean+scrubbing+deep, 
> 8191 active+clean; 183 TiB data, 542 TiB used, 419 TiB / 960 TiB avail; 491 
> MiB/s rd, 359 MiB/s wr, 20.99k op/s

First laggy pg:
> 2025-10-16T21:30:11.789813+0000 mgr.mgr-01-<cluster-fsid> (mgr.8232603159) 
> 47661 : cluster [DBG] pgmap v22192: 8193 pgs: 4 active+clean+laggy, 2 
> active+clean+scrubbing+deep, 8187 active+clean; 183 TiB data, 542 TiB used, 
> 419 TiB / 960 TiB avail; 328 MiB/s rd, 247 MiB/s wr, 14.01k op/s

First hearbeat problem:
> 2025-10-16T21:30:21.893+0000 7f578efd7700 -1 osd.yyy 602287 heartbeat_check: 
> no reply from xyx.xyz.xyz.xyz:6802 osd.xxx since back 
> 2025-10-16T21:29:57.827954+0000 front 2025-10-16T21:29:57.828552+0000 (oldest 
> deadline 2025-10-16T21:30:21.328590+0000)

First report of failed osd:
> 2025-10-16T21:30:21.841714+0000 mon.mon-01-<cluster-fsid> (mon.0) 5235394 : 
> cluster [DBG] osd.16 reported failed by osd.182

First failed osd:
> 2025-10-16T21:30:21.841681+0000 mon.mon-01-<cluster-fsid> (mon.0) 5235393 : 
> cluster [INF] osd.15 failed (root=default,rack=rack123,host=host-123) (2 
> reporters from different host after 23.000420 >= grace 20.000000)

Host down:
> 2025-10-16T21:30:22.914804+0000 mon.mon-01-<cluster-fsid> (mon.0) 5235451 : 
> cluster [WRN] Health check failed: 1 host (24 osds) down (OSD_HOST_DOWN)

Some osd up again:
> 2025-10-16T21:30:22.892201+0000 mon.mon-01-<cluster-fsid> (mon.0) 5235450 : 
> cluster [DBG] osd.22 failure report canceled by osd.64
> 2025-10-16T21:30:21.904+0000 7f4c358ca700  1 
> mon.mon-04-<cluster-fsid>@1(peon).osd e602288 e602288: 283 total, 266 up, 283 
> in

Cluster during recovering:
> 2025-10-16T21:30:23.865481+0000 mgr.mgr-01-<cluster-fsid> (mgr.8232603159) 
> 47668 : cluster [DBG] pgmap v22200: 8193 pgs: 34 peering, 648 
> stale+active+clean, 1 active+clean+scrubbing+deep, 1 active+clean+scrubbing, 
> 676 active+clean+laggy, 60 active+undersized+degraded, 6773 active+clean; 183 
> TiB data, 542 TiB used, 419 TiB / 960 TiB avail; 4.7 MiB/s rd, 45 MiB/s wr, 
> 1.28k op/s; 372351/152450688 objects degraded (0.244%)

Again all down:
> 2025-10-16T21:30:23.936587+0000 mon.mon-01-<cluster-fsid> (mon.0) 5235453 : 
> cluster [DBG] osdmap e602290: 283 total, 259 up, 283 in

...

few osds were flapping up and down x few times (about 5) (259->266->259...->259)

...

Finally problem was born:
>2025-10-16T21:30:34.405+0000 7fe5613fc700  0 log_channel(cluster) log [WRN] : 
>Health check failed: snap trim queue for 4133 pg(s) >= 1024 
>(mon_osd_snap_trim_queue_warn_on) (PG_SLOW_SNAP_TRIMMING)


During problem our monitors with joy were compacting rocksdb:
> 2025-10-16T21:30:26.224+0000 7f4c310c1700  4 rocksdb: 
> [db/db_impl/db_impl_compaction_flush.cc:1615] [default] Manual compaction 
> starting (mon-04-<cluster-fsid>.caas.ovh.net)
> 2025-10-16T21:30:26.820+0000 7f0fefa75700  4 rocksdb: 
> [db/db_impl/db_impl_compaction_flush.cc:1615] [default] Manual compaction 
> starting (mon-05-<cluster-fsid>.caas.ovh.net)
----------

Of course we ended with 2M+ snaps to remove on every pg. Additionally after few 
minutes we saw such operation on mon:
> 2025-10-16T21:37:11.434+0000 7fe5613fc700 -1 
> mon.mon-01-<cluster-fsid>@0(leader) e10 get_health_metrics reporting 1 slow 
> ops, oldest is pool_op(delete unmanaged snap pool 1 tid 7113677 name  v602484)


Do you have any idea what might be the reason for this issue and how to address 
it? Have you encountered a similar problem elsewhere? Currently, we are running 
snaptrim, but it may take several weeks to complete, and during this time, the 
cluster is severely impacted. Maybe do You have any idea what to check next, 
where can we found a clue.

Thank You for any help,
Best Regards,
Lukasz Lucki Gomulka

3 les/c/f=607984/607984/0 sis=607983) [273,270,52] r=0 lpr=607983 
crt=638254'1212244328 lcod 638254'1212244327 mlcod 638254'1212244327 
active+clean+snaptrim trimq=1626528 ps=250177] SnapTrimmer state<Trim
ming/AwaitAsyncWork>: AwaitAsyncWork: trimming snap 10c744
2025-10-27T09:04:30.426+0000 7f64e46fd700 10 osd.273 pg_epoch: 638254 
pg[1.107c( v 638254'113190002 (638250'113186267,638254'113190002] 
local-lis/les=607904/607905 n=6486 ec=1086/118 lis/c=607904/607904 l
es/c/f=607905/607989/0 sis=607904) [273,29,156] r=0 lpr=607904 
crt=638254'113190002 lcod 638254'113190001 mlcod 638254'113190001 
active+clean+snaptrim_wait trimq=1624919 ps=251786]  sending reply on osd_o
p(client.8199380052.0:1732813394 1.107c 
1:3e0a7409:::rbd_data.961cf669eba3cc.000000000000d124:head [write 4177920~8192 
in=8192b] snapc 0=[] ondisk+write+known_if_redirected e638254) v8 0x564745c8a900
2025-10-27T09:04:30.426+0000 7f64e56ff700 20 osd.273 op_wq(2) _process empty q, 
waiting
2025-10-27T09:04:30.426+0000 7f64e9707700 10 osd.273 pg_epoch: 638254 
pg[1.14f2( v 638254'75689419 (638161'75685920,638254'75689419] 
local-lis/les=607935/607936 n=6452 ec=1121/118 lis/c=607935/607935 les/
c/f=607936/607978/0 sis=607935) [104,99,273] r=2 lpr=607935 luod=0'0 
lua=614430'74931019 crt=638254'75689419 lcod 638254'75689417 mlcod 
638254'75689416 active mbc={} ps=209461] repop_commit on op osd_repo
p(client.8245286537.0:147151 1.14f2 e638254/607935 
1:4f2a7df5:::rbd_data.73044b44141b6a.0000000000000600:head v 638254'75689419, 
mlcod=638254'75689416) v3, sending commit to osd.104
2025-10-27T09:04:30.426+0000 7f64e9707700 20 osd.273 638254 
send_message_osd_cluster osd_repop_reply to osd.104 from_epoch 638254
2025-10-27T09:04:30.426+0000 7f64e9707700 20 osd.273 638254 
get_nextmap_reserved map_reservations: {638254=1}
2025-10-27T09:04:30.426+0000 7f64e9707700 20 osd.273 638254 release_map epoch: 
638254
2025-10-27T09:04:30.426+0000 7f64e9707700 20 osd.273 op_wq(2) _process empty q, 
waiting
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7A1) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7A3) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7A5) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7A7) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7A9) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e46fd700 15 osd.273 pg_epoch: 638254 
pg[1.107c( v 638254'113190002 (638250'113186267,638254'113190002] 
local-lis/les=607904/607905 n=6486 ec=1086/118 lis/c=607904/607904 l
es/c/f=607905/607989/0 sis=607904) [273,29,156] r=0 lpr=607904 
crt=638254'113190002 lcod 638254'113190001 mlcod 638254'113190001 
active+clean+snaptrim_wait trimq=1624919 ps=251786] publish_stats_to_osd 63
8254:291629753
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7AB) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7AD) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e2efa700 20 
snap_mapper.get_next_objects_to_trim 
get_next(SNA_1_000000000010C744_0000000000000001.F7AF) returns 0 
SNA_1_00000000004C2920_0000000000000001.07D11DF6.4c2920.r
bd%udata%e1deca847dbf1fa%e00000000000000b7..,buffer::list(len=93,
        buffer::ptr(0~93 0x56476e1ec0c0 in raw 0x56476e1ec0c0 len 93 nref 1)
)
2025-10-27T09:04:30.426+0000 7f64e46fd700 10 osd.273 pg_epoch: 638254 
pg[1.107c( v 638254'113190002 (638250'113186267,638254'113190002] 
local-lis/les=607904/607905 n=6486 ec=1086/118 lis/c=607904/607904 l
es/c/f=607905/607989/0 sis=607904) [273,29,156] r=0 lpr=607904 
crt=638254'113190002 lcod 638254'113190001 mlcod 638254'113190001 
active+clean+snaptrim_wait trimq=1624919 ps=251786]  removing repgather(0x5
6476ef1de00 638254'113190002 rep_tid=32818178 committed?=1 r=0)
2025-10-27T09:04:30.426+0000 7f64e2efa700 10 osd.273 pg_epoch: 638254 
pg[1.1a7f( v 638254'1212244328 (638216'1212240604,638254'1212244328] 
local-lis/les=607983/607984 n=6721 ec=1163/118 lis/c=607983/60798
3 les/c/f=607984/607984/0 sis=607983) [273,270,52] r=0 lpr=607983 
crt=638254'1212244328 lcod 638254'1212244327 mlcod 638254'1212244327 
active+clean+snaptrim trimq=1626528 ps=250177] SnapTrimmer state<Trim
ming/AwaitAsyncWork>: got ENOENT
2025-10-27T09:04:30.426+0000 7f64e46fd700 20 osd.273 pg_epoch: 638254 
pg[1.107c( v 638254'113190002 (638250'113186267,638254'113190002] 
local-lis/les=607904/607905 n=6486 ec=1086/118 lis/c=607904/607904 l
es/c/f=607905/607989/0 sis=607904) [273,29,156] r=0 lpr=607904 
crt=638254'113190002 lcod 638254'113190001 mlcod 638254'113190001 
active+clean+snaptrim_wait trimq=1624919 ps=251786]    q front is repgather
(0x56476ef1de00 638254'113190002 rep_tid=32818178 committed?=1 r=0)
2025-10-27T09:04:30.426+0000 7f64e46fd700 15 osd.273 pg_epoch: 638254 
pg[1.107c( v 638254'113190002 (638250'113186267,638254'113190002] 
local-lis/les=607904/607905 n=6486 ec=1086/118 lis/c=607904/607904 l
es/c/f=607905/607989/0 sis=607904) [273,29,156] r=0 lpr=607904 
crt=638254'113190002 lcod 638254'113190001 mlcod 638254'113190001 
active+clean+snaptrim_wait trimq=1624919 ps=251786] do_osd_op_effects clien
t.8199380052 con 0x5647749e6000
2025-10-27T09:04:30.426+0000 7f64e2efa700 10 osd.273 pg_epoch: 638254 
pg[1.1a7f( v 638254'1212244328 (638216'1212240604,638254'1212244328] 
local-lis/les=607983/607984 n=6721 ec=1163/118 lis/c=607983/60798
3 les/c/f=607984/607984/0 sis=607983) [273,270,52] r=0 lpr=607983 
crt=638254'1212244328 lcod 638254'1212244327 mlcod 638254'1212244327 
active+clean+snaptrim trimq=1626527 ps=250177] SnapTrimmer state<Trim
ming/AwaitAsyncWork>: adding snap 10c744 to purged_snaps
2025-10-27T09:04:30.426+0000 7f64e46fd700 20 osd.273 pg_epoch: 638254 
pg[1.107c( v 638254'113190002 (638250'113186267,638254'113190002] 
local-lis/les=607904/607905 n=6486 ec=1086/118 lis/c=607904/607904 l
es/c/f=607905/607989/0 sis=607904) [273,29,156] r=0 lpr=607904 
crt=638254'113190002 lcod 638254'113190001 mlcod 638254'113190001 
active+clean+snaptrim_wait trimq=1624919 ps=251786] remove_repop repgather(
0x56476ef1de00 638254'113190002 rep_tid=32818178 committed?=1 r=0)
2025-10-27T09:04:30.426+0000 7f64e46fd700 10 osd.273 638254 dequeue_op 
0x56477c5d6480 finish

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Snaptrim flood

Reply via email to