> Swap can possibly reduce your clusters performance, not? Osd-processes that > swap data will result in supplementary and unwanted disk I/o
Absolutely. Moreover I’ve seen modern(ish) Linux systems anomalously using swap space when there is available physmem, in excess of vm.min_free_kybytes. > > I've got 10 OSD's per host and my memory consumption of ceph is typically > 70GiB per host... Each host has about 40GiB available memory which is > sufficient (for my setup) except one time I ran out of memory deleting old > snapshots. But 8GiB wouldn't have helped... Exactly. Filesystem swap can be a useful emergency tool, but in 2025 it should not be routine. In 1985 a diskless Sun 2/50 with 3MB of physmem (yes, MB) needed swap, and that was SUPER fun over 10GE ethernet against a Fuji Eagle. In years beginning with a 2, DRAM prices are such that if disabling swap causes a problem, then that’s a sign that you really, really need more physmem. Swap is 12% of your virtual memory right now. If you run hotter than 84% usage then really you need more. By default the osd_memory_target autotune should be enabled, see what values it is setting. By default it will divide 70% of physmem by however many OSDs are placed on a host: # ceph config dump | grep osd_memory_target osd host:x basic osd_memory_target 12060218196 osd host:y basic osd_memory_target 12062644163 osd host:z basic osd_memory_target 6614520783 Yes, the above cluster has lots of physmem, which is very very fortunate because it’s based on large HDDs and otherwise would have fallen over (if I told you the details, you wouldn’t sleep at night). The money would have been better spent on QLC, but I digress. If autotune isn’t on, the default osd_memory_target is 4GB. Remember that it’s a target not a limit. The docs advise a 20% headroom of available physmem, having suffered a few things I like to advise 50% at least, plus margin to run mons/mgrs/mds/etc. > OSD containers consumes reasonable amount of RAM (~2.6GB - ~3.6 GB): Actually that’s another sign that you may be starved unless these OSDs are rather idle. Are you using cephadm, or something else to manage the containers? Is it enforcing an artificial limit on them? With 64GB of physmem cephadm’s autotuning would assign an osd_memory_target of 4.5GB. Memory allocation practice and accounting vary across kernel revisions, which may be a factor here. What model of chassis are these? Adding even 4x8GB super cheap DIMMs to each would do you a world of good, with more of course even better. Be sure to not mix SKUs within a bank, and populate slots according to your motherboard’s documentation. > > > >> -----Oorspronkelijk bericht----- >> Van: Dmitrijs Demidovs <dmitrijs.demid...@carminered.eu> >> Verzonden: vrijdag 23 mei 2025 10:16 >> Aan: ceph-users@ceph.io >> Onderwerp: [ceph-users] Re: SWAP usage 100% on OSD hosts after >> migration to Rocky Linux 9 (Ceph 16.2.15) >> >> Hi Anthony. >> >> Yes we have swap enabled. Old Rocky 8 and new Rocky 9 OSD hosts both >> configured with 8G of swap. >> >> I will try to disable swap, but I guess what we will get a lot of Out Of >> Memory >> messages on OSD hosts. >> >> >> >> = old: >> [root@ceph-osd11 ~]# free -h >> total used free shared buff/cache >> available >> Mem: 62Gi 30Gi 1.2Gi 2.1Gi 30Gi 29Gi >> Swap: 8.0Gi 2.8Gi 5.2Gi >> >> = new: >> [root@ceph-osd17 ~]# free -h >> total used free shared buff/cache >> available >> Mem: 62Gi 26Gi 1.0Gi 1.0Gi 36Gi 36Gi >> Swap: 8.0Gi 8.0Gi 7.0Mi >> >> >> >> >> >> >> OSD containers consumes reasonable amount of RAM (~2.6GB - ~3.6 GB): >> >> >> [root@ceph-osd17 ~]# docker stats --no-stream >> CONTAINER ID NAME >> CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK >> I/O PIDS >> 5cc58e4a77b2 ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-52 >> 0.28% 3.576GiB / 62.28GiB 5.74% 0B / 0B 3.9TB / >> 975GB 62 >> 3a60fecf648d ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-50 >> 0.28% 2.912GiB / 62.28GiB 4.68% 0B / 0B 100TB / >> 45.7TB 62 >> 9c20407e79eb ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-49 >> 0.28% 2.905GiB / 62.28GiB 4.66% 0B / 0B 93TB / >> 35.8TB 62 >> 9deadafef9dd ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-48 >> 0.56% 3.624GiB / 62.28GiB 5.82% 0B / 0B 102TB / >> 39.2TB 62 >> fcfe62a25fd9 ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-55 >> 0.40% 2.968GiB / 62.28GiB 4.77% 0B / 0B 83.2TB / >> 34.8TB 62 >> 38d2d96cc491 ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-51 >> 1.42% 2.666GiB / 62.28GiB 4.28% 0B / 0B 105TB / >> 38.1TB 62 >> e29c6bbc1ae7 ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-54 >> 2.01% 3.687GiB / 62.28GiB 5.92% 0B / 0B 106TB / >> 44.6TB 62 >> 40346a7a45ea ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-53 >> 0.69% 2.748GiB / 62.28GiB 4.41% 0B / 0B 103TB / >> 41.4TB 62 >> 43c3e3a65531 >> ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-crash-ceph-osd17 >> 0.00% 3.73MiB / 62.28GiB 0.01% 0B / 0B 567MB / 18MB 2 >> d9e436f9788c >> ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-node-exporter-ceph-osd17 >> 15.04% 30.25MiB / 62.28GiB 0.05% 0B / 0B 410MB / 14.6MB 61 >> >> >> >> >> >> But they also are biggest swap consumers: >> >> [root@ceph-osd17 ~]# for file in /proc/*/status; do awk >> '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 2 >> -n -r | more >> ceph-osd 1553520 kB >> ceph-osd 1447728 kB >> ceph-osd 1218768 kB >> ceph-osd 1117536 kB >> ceph-osd 1026548 kB >> ceph-osd 641632 kB >> ceph-osd 495080 kB >> ceph-osd 424392 kB >> firewalld 26880 kB >> dockerd 20352 kB >> containerd 11136 kB >> docker 6144 kB >> docker 6144 kB >> docker 5952 kB >> docker 5952 kB >> docker 5952 kB >> docker 5952 kB >> docker 5952 kB >> docker 5760 kB >> (sd-pam) 5184 kB >> ceph-crash 4416 kB >> python3 4224 kB >> docker 4032 kB >> systemd-udevd 3264 kB >> >> >> >> >> >> >> On 22.05.2025 18:34, Anthony D'Atri wrote: >>> >>>> >>>> Problem: >>>> >>>> After migration to Rocky 9 (and new version of Docker) we see what our >> OSD hosts consumes 100% of SWAP space! It takes approximately one week >> to fill SWAP from 0% to 100%. >>> >>> Why do you have swap configured at all? I suggest disabling swap in fstab >> and rebooting serially. >>> >>> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io