> But checking the top output etc. doesn't confirm those values.
I suspect a startup peak that subsides for steady-state operation. I observed this with mons back in Luminous. A cluster had been expanded considerably without restarting mons, so when they tried to restart there wasn’t enough memory. The prom process collector might capture this, or frequent sampling of the OSDs’ admin socket for TCMALLOC heap stats. Logs should also have a profile of processes with info at the time of OOMkilling. > I don't really know where they come from, tbh. > Can you confirm that those are actually OSD processes filling up the RAM? > > Zitat von Jonas Schwab <jonas.sch...@uni-wuerzburg.de>: Please set your MUA to not wrap. > >> Hello everyone, >> >> I recently have many problems with OSDs using much more memory than they >> are supposed to (> 10GB), leading to the node running out of memory and >> killing processes. Does someone have ideas why the daemons seem >> to completely ignore the set memory limits? Remember that osd_memory_target is a TARGET not a LIMIT. Upstream docs suggest an aggregate 20% headroom, personally I like 100% headroom, but that’s informed by some prior experiences that likely are much less of a concern these days. >> >> See e.g. the following: >> >> $ ceph orch ps ceph2-03 >> NAME HOST PORTS STATUS REFRESHED AGE MEM USE >> MEM LIM VERSION IMAGE ID CONTAINER ID I had not noticed the MEM LIM column. Digging through the source I don’t immediately see where that comes from, but I suspect that in a non-Rook environment, at least, it reflects osd_memory_target. Your nodes each have 11 OSDs, and at least some have a mon or other daemons? How much RAM do the nodes have? `ceph osd dump | grep pool` `ceph status` `ceph config dump | grep osd_memory_target` `ceph config dump | grep osd_memory_target_autotune` Do the nodes run anything that isn’t Ceph? Do you have cron jobs or playbook runs or something that might cause an ephemeral yet hungry process to run at times? If MEM LIM is indeed the osd_memory_target, your nodes would seem to be light on RAM. The default osd_memory_target is 4GB. I’m going to SWAG that this node has 52GB of physmem? If so that IMHO is way too low. I would suggest at least 128GB for 11 OSDs. Are any of these OSDs legacy Filestore? >> mon.ceph2-03 ceph2-03 running (3h) 1s ago 2y >> 501M 2048M 19.2.1 f2efb0401a30 d876fc30f741 >> node-exporter.ceph2-03 ceph2-03 *:9100 running (3h) 1s ago 17M >> 46.5M - 1.7.0 72c9c2088986 d32ec4d266ea >> osd.4 ceph2-03 running (26m) 1s ago 2y >> 10.2G 3310M 19.2.1 f2efb0401a30 b712a86dacb2 >> osd.11 ceph2-03 running (5m) 1s ago 2y >> 3458M 3310M 19.2.1 f2efb0401a30 f3d7705325b4 >> osd.13 ceph2-03 running (3h) 1s ago 6d >> 2059M 3310M 19.2.1 f2efb0401a30 980ee7e11252 >> osd.17 ceph2-03 running (114s) 1s ago 2y >> 3431M 3310M 19.2.1 f2efb0401a30 be7319fda00b >> osd.23 ceph2-03 running (30m) 1s ago 2y >> 10.4G 3310M 19.2.1 f2efb0401a30 9cfb86c4b34a >> osd.29 ceph2-03 running (8m) 1s ago 2y >> 4923M 3310M 19.2.1 f2efb0401a30 d764930bb557 >> osd.35 ceph2-03 running (14m) 1s ago 2y >> 7029M 3310M 19.2.1 f2efb0401a30 6a4113adca65 >> osd.59 ceph2-03 running (2m) 1s ago 2y >> 2821M 3310M 19.2.1 f2efb0401a30 8871d6d4f50a >> osd.61 ceph2-03 running (49s) 1s ago 2y >> 1090M 3310M 19.2.1 f2efb0401a30 3f7a0ed17ac2 >> osd.67 ceph2-03 running (7m) 1s ago 2y >> 4541M 3310M 19.2.1 f2efb0401a30 eea0a6bcefec >> osd.75 ceph2-03 running (3h) 1s ago 2y >> 1239M 3310M 19.2.1 f2efb0401a30 5a801902340d >> >> Best regards, >> Jonas >> >> -- >> Jonas Schwab >> >> Research Data Management, Cluster of Excellence ct.qmat >> https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de >> Email: jonas.sch...@uni-wuerzburg.de >> Tel: +49 931 31-84460 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io