[ceph-users] Re: Snaptrim flood

Eugen Block Thu, 06 Nov 2025 06:53:46 -0800

I think there's some more investigation required to get to the bottomof this. Do you by any chance have a snap schedule enabled which wouldcreate snapshots automatically? Do you see that many snapshots in 'rbd-p pool ls --long' output?

By the way, having millions of purged snapshots can have quite a heavyimpact(https://lists.ceph.io/hyperkitty/list/[email protected]/thread/YRY2CGWSFHTEMXYPYL2CUGK6XOQDG3Z2/).


I'm still not sure what to think about the 100% used output, though.


Zitat von Eugen Block <[email protected]>:

I don’t have much time right now to look deeper, but I agree, the100% used column is something to look into. That might either be theroot cause for snaptrims not happening or it might play some role init.
Zitat von Lukasz Gomulka <[email protected]>:
Hello!
Thank You for Your answer,
Right now we have problems with cluster on NVMe so DB naturally isalso on NVMe. Last time we have issue on HDD+NVMe(for DB/WAL). Nowwe have mClock scheduler. Generally we are able to snaptrim but itcost CPU resources. We can snaptrim thousands of snapshots per day.But the main problem and question still remain. Why we got withoutany reason 2.5M of snaps to remove. What should we do. How toprevent. It is a chance that if 2.5M of snaps came in seconds thenit is possible to remove it in few seconds. Some db corruptionmaybe easy to clean.
Current cluster:

version - 17.2.7
mclock scheduler

I attached also outputs from:

ceph -s
ceph osd df tree
ceph osd pool ls detail
ceph df
I have to add that in json there is no entry in removed_snaps butin normal output in removed_snaps_queue we can see:
pool 1 'volumes' replicated size 3 min_size 2 crush_rule 0object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode offlast_change 667587 lfor 0/0/1211 > flagshashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.5application rbd
    removed_snaps_queue [58ae2~3,58af4~3,58cd5~5,58cdd~3
   ... million+ ...
c59ae~3,4c59b6~3,4c59bc~3]
Similar malformed output is in attached `ceph df`. Pool of coursehave still available space but commands shows `0%`.
Best Regards,
Lukasz Lucki Gomulka

________________________________
From: Eugen Block <[email protected]>
Sent: 03 November 2025 11:53:06
To: [email protected]
Subject: [ceph-users] Re: Snaptrim flood

Hi,

are you using mclock scheduler (default in Quincy)? Until Reef 18.2.4
there was a default value set for osd_snap_trim_cost (1M bytes) which
blocked snaptrims [0]. This was fixed in [1] and backported to Reef.
But it's unlikely that this was your issue in Octopus as mclock became
the default in Pacific, IIRC. Since Quincy is also EOL, I'd recommed
to update further, if possible.
Were you able to avoid OSD flapping with the nodown flag (ceph osd set
nodown)? This can help to keep the cluster more stable in such
situations.
Can you add some more details about your setup like:

ceph -s
ceph osd df tree
ceph osd pool ls detail
ceph df

Are you using HDD OSDs or HDDs with dedicated DB/WAL? How many
snapshots are you generating?

Regards,
Eugen

[0] https://tracker.ceph.com/issues/67702
[1] https://tracker.ceph.com/issues/63604
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Snaptrim flood

Reply via email to