Just in case, make sure the Ceph builds you use do have tcmalloc enabled in the first place.
The only time I’ve seen OSDs exceed their memory targets so far was on a Pacific cluster that used Debian 12 provided packages, and I eventually figured that those had Crimson enabled - which comes with its own allocator and completely disables tcmalloc, making osd_memory_target completely ignored. > On 9. Apr 2025, at 15:54, Eugen Block <ebl...@nde.ag> wrote: > > Then I suggest to do the usual troubleshooting [0], not necessarily in this > order: > > - osd logs > - ceph tell osd.X heap stats > - ceph osd df tree (to look for unbalanced PG distribution) > - check tracker.ceph.com for existing issues > - How are the nodes equipped RAM wise? > - Are the oom killers happening across all OSDs or only a subset, or even > always the same ones? > - Is the cluster healthy? 'ceph -s' output could be useful. > > Squid has the osd_memory_target_autotune feature enabled by default, can you > check 'ceph config dump' and look for osd memory entries? > > [0] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/ > > Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>: > >> Yes, it's the ceph-osd processes filling up the RAM. >> >> On 2025-04-09 15:13, Eugen Block wrote: >>> I noticed the quite high reported memory stats for OSDs as well on a >>> recently upgraded customer cluster, now running 18.2.4. But checking >>> the top output etc. doesn't confirm those values. I don't really know >>> where they come from, tbh. >>> Can you confirm that those are actually OSD processes filling up the RAM? >>> >>> Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>: >>> >>>> Hello everyone, >>>> >>>> I recently have many problems with OSDs using much more memory than they >>>> are supposed to (> 10GB), leading to the node running out of memory and >>>> killing processes. Does someone have ideas why the daemons seem >>>> to completely ignore the set memory limits? >>>> >>>> See e.g. the following: >>>> >>>> $ ceph orch ps ceph2-03 >>>> NAME HOST PORTS STATUS REFRESHED AGE MEM >>>> USE MEM LIM VERSION IMAGE ID CONTAINER ID >>>> mon.ceph2-03 ceph2-03 running (3h) 1s ago >>>> 2y 501M 2048M 19.2.1 f2efb0401a30 d876fc30f741 >>>> node-exporter.ceph2-03 ceph2-03 *:9100 running (3h) 1s ago >>>> 17M 46.5M - 1.7.0 72c9c2088986 d32ec4d266ea >>>> osd.4 ceph2-03 running (26m) 1s ago >>>> 2y 10.2G 3310M 19.2.1 f2efb0401a30 b712a86dacb2 >>>> osd.11 ceph2-03 running (5m) 1s ago >>>> 2y 3458M 3310M 19.2.1 f2efb0401a30 f3d7705325b4 >>>> osd.13 ceph2-03 running (3h) 1s ago >>>> 6d 2059M 3310M 19.2.1 f2efb0401a30 980ee7e11252 >>>> osd.17 ceph2-03 running (114s) 1s ago >>>> 2y 3431M 3310M 19.2.1 f2efb0401a30 be7319fda00b >>>> osd.23 ceph2-03 running (30m) 1s ago >>>> 2y 10.4G 3310M 19.2.1 f2efb0401a30 9cfb86c4b34a >>>> osd.29 ceph2-03 running (8m) 1s ago >>>> 2y 4923M 3310M 19.2.1 f2efb0401a30 d764930bb557 >>>> osd.35 ceph2-03 running (14m) 1s ago >>>> 2y 7029M 3310M 19.2.1 f2efb0401a30 6a4113adca65 >>>> osd.59 ceph2-03 running (2m) 1s ago >>>> 2y 2821M 3310M 19.2.1 f2efb0401a30 8871d6d4f50a >>>> osd.61 ceph2-03 running (49s) 1s ago >>>> 2y 1090M 3310M 19.2.1 f2efb0401a30 3f7a0ed17ac2 >>>> osd.67 ceph2-03 running (7m) 1s ago >>>> 2y 4541M 3310M 19.2.1 f2efb0401a30 eea0a6bcefec >>>> osd.75 ceph2-03 running (3h) 1s ago >>>> 2y 1239M 3310M 19.2.1 f2efb0401a30 5a801902340d >>>> >>>> Best regards, >>>> Jonas >>>> >>>> -- >>>> Jonas Schwab >>>> >>>> Research Data Management, Cluster of Excellence ct.qmat >>>> https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de >>>> Email: jonas.sch...@uni-wuerzburg.de >>>> Tel: +49 931 31-84460 >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@ceph.io >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> -- >> Jonas Schwab >> >> Research Data Management, Cluster of Excellence ct.qmat >> https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de >> Email: jonas.sch...@uni-wuerzburg.de >> Tel: +49 931 31-84460 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io