Just in case, make sure the Ceph builds you use do have tcmalloc enabled in the 
first place.

The only time I’ve seen OSDs exceed their memory targets so far was on a 
Pacific cluster that used Debian 12 provided packages, and I eventually figured 
that those had Crimson enabled - which comes with its own allocator and 
completely disables tcmalloc, making osd_memory_target completely ignored.

> On 9. Apr 2025, at 15:54, Eugen Block <ebl...@nde.ag> wrote:
> 
> Then I suggest to do the usual troubleshooting [0], not necessarily in this 
> order:
> 
> - osd logs
> - ceph tell osd.X heap stats
> - ceph osd df tree (to look for unbalanced PG distribution)
> - check tracker.ceph.com for existing issues
> - How are the nodes equipped RAM wise?
> - Are the oom killers happening across all OSDs or only a subset, or even 
> always the same ones?
> - Is the cluster healthy? 'ceph -s' output could be useful.
> 
> Squid has the osd_memory_target_autotune feature enabled by default, can you 
> check 'ceph config dump' and look for osd memory entries?
> 
> [0] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/
> 
> Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>:
> 
>> Yes, it's the ceph-osd processes filling up the RAM.
>> 
>> On 2025-04-09 15:13, Eugen Block wrote:
>>> I noticed the quite high reported memory stats for OSDs as well on a
>>> recently upgraded customer cluster, now running 18.2.4. But checking
>>> the top output etc. doesn't confirm those values. I don't really know
>>> where they come from, tbh.
>>> Can you confirm that those are actually OSD processes filling up the RAM?
>>> 
>>> Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>:
>>> 
>>>> Hello everyone,
>>>> 
>>>> I recently have many problems with OSDs using much more memory than they
>>>> are supposed to (> 10GB), leading to the node running out of memory and
>>>> killing processes. Does someone have ideas why the daemons seem
>>>> to completely ignore the set memory limits?
>>>> 
>>>> See e.g. the following:
>>>> 
>>>> $ ceph orch ps ceph2-03
>>>> NAME                    HOST      PORTS   STATUS REFRESHED  AGE MEM
>>>> USE  MEM LIM  VERSION  IMAGE ID CONTAINER ID
>>>> mon.ceph2-03            ceph2-03          running (3h)       1s ago
>>>> 2y     501M    2048M  19.2.1   f2efb0401a30  d876fc30f741
>>>> node-exporter.ceph2-03  ceph2-03  *:9100  running (3h)       1s ago
>>>> 17M    46.5M        -  1.7.0    72c9c2088986  d32ec4d266ea
>>>> osd.4                   ceph2-03          running (26m)      1s ago
>>>> 2y    10.2G    3310M  19.2.1   f2efb0401a30  b712a86dacb2
>>>> osd.11                  ceph2-03          running (5m)       1s ago
>>>> 2y    3458M    3310M  19.2.1   f2efb0401a30  f3d7705325b4
>>>> osd.13                  ceph2-03          running (3h)       1s ago
>>>> 6d    2059M    3310M  19.2.1   f2efb0401a30  980ee7e11252
>>>> osd.17                  ceph2-03          running (114s)     1s ago
>>>> 2y    3431M    3310M  19.2.1   f2efb0401a30  be7319fda00b
>>>> osd.23                  ceph2-03          running (30m)      1s ago
>>>> 2y    10.4G    3310M  19.2.1   f2efb0401a30  9cfb86c4b34a
>>>> osd.29                  ceph2-03          running (8m)       1s ago
>>>> 2y    4923M    3310M  19.2.1   f2efb0401a30  d764930bb557
>>>> osd.35                  ceph2-03          running (14m)      1s ago
>>>> 2y    7029M    3310M  19.2.1   f2efb0401a30  6a4113adca65
>>>> osd.59                  ceph2-03          running (2m)       1s ago
>>>> 2y    2821M    3310M  19.2.1   f2efb0401a30  8871d6d4f50a
>>>> osd.61                  ceph2-03          running (49s)      1s ago
>>>> 2y    1090M    3310M  19.2.1   f2efb0401a30  3f7a0ed17ac2
>>>> osd.67                  ceph2-03          running (7m)       1s ago
>>>> 2y    4541M    3310M  19.2.1   f2efb0401a30  eea0a6bcefec
>>>> osd.75                  ceph2-03          running (3h)       1s ago
>>>> 2y    1239M    3310M  19.2.1   f2efb0401a30  5a801902340d
>>>> 
>>>> Best regards,
>>>> Jonas
>>>> 
>>>> --
>>>> Jonas Schwab
>>>> 
>>>> Research Data Management, Cluster of Excellence ct.qmat
>>>> https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
>>>> Email: jonas.sch...@uni-wuerzburg.de
>>>> Tel: +49 931 31-84460
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> --
>> Jonas Schwab
>> 
>> Research Data Management, Cluster of Excellence ct.qmat
>> https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
>> Email: jonas.sch...@uni-wuerzburg.de
>> Tel: +49 931 31-84460
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to