[ceph-users] Untrimmed purged_snaps impacts osd startup, possibly oom killer

Eugen Block Tue, 21 Oct 2025 16:21:57 -0700

Hi,

this is kind of a follow-up on a two years old thread [0] and I wantedto raise some awareness for the corresponding tracker [1].

Back then we managed to limit the impact on the mon store performancewith some paxos configs, but now the OSDs are impacted as well whenthere are new ones created:

Each newly created OSD process grows in RAM usage to around 140 GBwithin a few minutes, easily causing oom killers on hosts if multipleOSDs are created at once. The residual RAM usage drops to the memorytarget after it successfully booted. The reason are the purged_snapsthat are loaded during OSD boot (snap_mapper.record_purged_snapspurged_snaps), two years ago the customer had more than 42 millionpurged_snap entries. I don't know how many there are today, I don'thave access myself, but I'll try to get a current number.Anyway, the only way to safely create OSDs is one by one, maybe two atonce depending on the host's RAM capacity. An automated (unattended)OSD deployment is currently not possible.Unfortunately, the new tool [2] doesn't seem to work as expected, atleast in my test cluster it didn't have any impact on the number ofpurged_snaps in the mon store. That's why we haven't tried it on thecustomer cluster(s) yet.

How do other operators/admins/users deal with this kind of scenario?Having many snapshots can't be a corner case, but I can't rememberhaving read anything like this on the list(s). I'd appreciate anycomments, although I'm aware that everybody is probably busytravelling to Cephalocon. ;-)


Thanks!
Eugen

[0]https://lists.ceph.io/hyperkitty/list/[email protected]/thread/ZEMGKBLMEREBZB7SWOLDA6QZX3S7FLL3/#ZEMGKBLMEREBZB7SWOLDA6QZX3S7FLL3

[1] https://tracker.ceph.com/issues/64519
[2] https://github.com/ceph/ceph/pull/57548
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Untrimmed purged_snaps impacts osd startup, possibly oom killer

Reply via email to