Hi Jonas,

Anthony gave some good advice for some things to check.  You can also dump the mempool statistics for OSDs that you identify are over their memory target using: "ceph daemon osd.NNN dump_mempools"

The osd_memory_target code basically looks at the memory usage of the process and then periodically grows or shrinks the aggregate memory for caches based on how far off the process usage is from the target.  It's not perfect, but generally keeps memory close to the target size.  It can't do anything if there is a memory leak or other component driving the overall memory usage higher than the target though.  One example of this is that in erasure coded pools, huge xattrs on objects can drive pglog memory usage extremely high and the osd_memory_autotuning may not be able to compensate for it.

Having said this, I'd suggest looking at the actual targets and the mempools and see if you can figure out where the memory is going and if its truly over the target.  The targets themselves can be autotuned higher up in the stack in some cases.


Mark

On 4/9/25 07:52, Jonas Schwab wrote:
Hello everyone,

I recently have many problems with OSDs using much more memory than they
are supposed to (> 10GB), leading to the node running out of memory and
killing processes. Does someone have ideas why the daemons seem
to completely ignore the set memory limits?

See e.g. the following:

$ ceph orch ps ceph2-03
NAME                    HOST      PORTS   STATUS REFRESHED  AGE MEM
USE  MEM LIM  VERSION  IMAGE ID CONTAINER ID
mon.ceph2-03            ceph2-03          running (3h)       1s ago
2y     501M    2048M  19.2.1   f2efb0401a30  d876fc30f741
node-exporter.ceph2-03  ceph2-03  *:9100  running (3h)       1s ago
17M    46.5M        -  1.7.0    72c9c2088986  d32ec4d266ea
osd.4                   ceph2-03          running (26m)      1s ago
2y    10.2G    3310M  19.2.1   f2efb0401a30  b712a86dacb2
osd.11                  ceph2-03          running (5m)       1s ago
2y    3458M    3310M  19.2.1   f2efb0401a30  f3d7705325b4
osd.13                  ceph2-03          running (3h)       1s ago
6d    2059M    3310M  19.2.1   f2efb0401a30  980ee7e11252
osd.17                  ceph2-03          running (114s)     1s ago
2y    3431M    3310M  19.2.1   f2efb0401a30  be7319fda00b
osd.23                  ceph2-03          running (30m)      1s ago
2y    10.4G    3310M  19.2.1   f2efb0401a30  9cfb86c4b34a
osd.29                  ceph2-03          running (8m)       1s ago
2y    4923M    3310M  19.2.1   f2efb0401a30  d764930bb557
osd.35                  ceph2-03          running (14m)      1s ago
2y    7029M    3310M  19.2.1   f2efb0401a30  6a4113adca65
osd.59                  ceph2-03          running (2m)       1s ago
2y    2821M    3310M  19.2.1   f2efb0401a30  8871d6d4f50a
osd.61                  ceph2-03          running (49s)      1s ago
2y    1090M    3310M  19.2.1   f2efb0401a30  3f7a0ed17ac2
osd.67                  ceph2-03          running (7m)       1s ago
2y    4541M    3310M  19.2.1   f2efb0401a30  eea0a6bcefec
osd.75                  ceph2-03          running (3h)       1s ago
2y    1239M    3310M  19.2.1   f2efb0401a30  5a801902340d

Best regards,
Jonas

--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to