Hi Jonas,
Anthony gave some good advice for some things to check. You can also
dump the mempool statistics for OSDs that you identify are over their
memory target using: "ceph daemon osd.NNN dump_mempools"
The osd_memory_target code basically looks at the memory usage of the
process and then periodically grows or shrinks the aggregate memory for
caches based on how far off the process usage is from the target. It's
not perfect, but generally keeps memory close to the target size. It
can't do anything if there is a memory leak or other component driving
the overall memory usage higher than the target though. One example of
this is that in erasure coded pools, huge xattrs on objects can drive
pglog memory usage extremely high and the osd_memory_autotuning may not
be able to compensate for it.
Having said this, I'd suggest looking at the actual targets and the
mempools and see if you can figure out where the memory is going and if
its truly over the target. The targets themselves can be autotuned
higher up in the stack in some cases.
Mark
On 4/9/25 07:52, Jonas Schwab wrote:
Hello everyone,
I recently have many problems with OSDs using much more memory than they
are supposed to (> 10GB), leading to the node running out of memory and
killing processes. Does someone have ideas why the daemons seem
to completely ignore the set memory limits?
See e.g. the following:
$ ceph orch ps ceph2-03
NAME HOST PORTS STATUS REFRESHED AGE MEM
USE MEM LIM VERSION IMAGE ID CONTAINER ID
mon.ceph2-03 ceph2-03 running (3h) 1s ago
2y 501M 2048M 19.2.1 f2efb0401a30 d876fc30f741
node-exporter.ceph2-03 ceph2-03 *:9100 running (3h) 1s ago
17M 46.5M - 1.7.0 72c9c2088986 d32ec4d266ea
osd.4 ceph2-03 running (26m) 1s ago
2y 10.2G 3310M 19.2.1 f2efb0401a30 b712a86dacb2
osd.11 ceph2-03 running (5m) 1s ago
2y 3458M 3310M 19.2.1 f2efb0401a30 f3d7705325b4
osd.13 ceph2-03 running (3h) 1s ago
6d 2059M 3310M 19.2.1 f2efb0401a30 980ee7e11252
osd.17 ceph2-03 running (114s) 1s ago
2y 3431M 3310M 19.2.1 f2efb0401a30 be7319fda00b
osd.23 ceph2-03 running (30m) 1s ago
2y 10.4G 3310M 19.2.1 f2efb0401a30 9cfb86c4b34a
osd.29 ceph2-03 running (8m) 1s ago
2y 4923M 3310M 19.2.1 f2efb0401a30 d764930bb557
osd.35 ceph2-03 running (14m) 1s ago
2y 7029M 3310M 19.2.1 f2efb0401a30 6a4113adca65
osd.59 ceph2-03 running (2m) 1s ago
2y 2821M 3310M 19.2.1 f2efb0401a30 8871d6d4f50a
osd.61 ceph2-03 running (49s) 1s ago
2y 1090M 3310M 19.2.1 f2efb0401a30 3f7a0ed17ac2
osd.67 ceph2-03 running (7m) 1s ago
2y 4541M 3310M 19.2.1 f2efb0401a30 eea0a6bcefec
osd.75 ceph2-03 running (3h) 1s ago
2y 1239M 3310M 19.2.1 f2efb0401a30 5a801902340d
Best regards,
Jonas
--
Jonas Schwab
Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
Best Regards,
Mark Nelson
Head of Research and Development
Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com
We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io