Then I suggest to do the usual troubleshooting [0], not necessarily in
this order:
- osd logs
- ceph tell osd.X heap stats
- ceph osd df tree (to look for unbalanced PG distribution)
- check tracker.ceph.com for existing issues
- How are the nodes equipped RAM wise?
- Are the oom killers happening across all OSDs or only a subset, or
even always the same ones?
- Is the cluster healthy? 'ceph -s' output could be useful.
Squid has the osd_memory_target_autotune feature enabled by default,
can you check 'ceph config dump' and look for osd memory entries?
[0] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/
Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>:
Yes, it's the ceph-osd processes filling up the RAM.
On 2025-04-09 15:13, Eugen Block wrote:
I noticed the quite high reported memory stats for OSDs as well on a
recently upgraded customer cluster, now running 18.2.4. But checking
the top output etc. doesn't confirm those values. I don't really know
where they come from, tbh.
Can you confirm that those are actually OSD processes filling up the RAM?
Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>:
Hello everyone,
I recently have many problems with OSDs using much more memory than they
are supposed to (> 10GB), leading to the node running out of memory and
killing processes. Does someone have ideas why the daemons seem
to completely ignore the set memory limits?
See e.g. the following:
$ ceph orch ps ceph2-03
NAME HOST PORTS STATUS REFRESHED AGE MEM
USE MEM LIM VERSION IMAGE ID CONTAINER ID
mon.ceph2-03 ceph2-03 running (3h) 1s ago
2y 501M 2048M 19.2.1 f2efb0401a30 d876fc30f741
node-exporter.ceph2-03 ceph2-03 *:9100 running (3h) 1s ago
17M 46.5M - 1.7.0 72c9c2088986 d32ec4d266ea
osd.4 ceph2-03 running (26m) 1s ago
2y 10.2G 3310M 19.2.1 f2efb0401a30 b712a86dacb2
osd.11 ceph2-03 running (5m) 1s ago
2y 3458M 3310M 19.2.1 f2efb0401a30 f3d7705325b4
osd.13 ceph2-03 running (3h) 1s ago
6d 2059M 3310M 19.2.1 f2efb0401a30 980ee7e11252
osd.17 ceph2-03 running (114s) 1s ago
2y 3431M 3310M 19.2.1 f2efb0401a30 be7319fda00b
osd.23 ceph2-03 running (30m) 1s ago
2y 10.4G 3310M 19.2.1 f2efb0401a30 9cfb86c4b34a
osd.29 ceph2-03 running (8m) 1s ago
2y 4923M 3310M 19.2.1 f2efb0401a30 d764930bb557
osd.35 ceph2-03 running (14m) 1s ago
2y 7029M 3310M 19.2.1 f2efb0401a30 6a4113adca65
osd.59 ceph2-03 running (2m) 1s ago
2y 2821M 3310M 19.2.1 f2efb0401a30 8871d6d4f50a
osd.61 ceph2-03 running (49s) 1s ago
2y 1090M 3310M 19.2.1 f2efb0401a30 3f7a0ed17ac2
osd.67 ceph2-03 running (7m) 1s ago
2y 4541M 3310M 19.2.1 f2efb0401a30 eea0a6bcefec
osd.75 ceph2-03 running (3h) 1s ago
2y 1239M 3310M 19.2.1 f2efb0401a30 5a801902340d
Best regards,
Jonas
--
Jonas Schwab
Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
Jonas Schwab
Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io