[ceph-users] Re: SWAP usage 100% on OSD hosts after migration to Rocky Linux 9 (Ceph 16.2.15)

Anthony D'Atri Fri, 23 May 2025 05:54:01 -0700


> Swap can possibly reduce your clusters performance, not? Osd-processes that 
> swap data will result in supplementary and unwanted disk I/o


Absolutely. Moreover I’ve seen modern(ish) Linux systems anomalously using swap 
space when there is available physmem, in excess of vm.min_free_kybytes.

> 
> I've got 10 OSD's per host and my memory consumption of ceph is typically 
> 70GiB per host... Each host has about 40GiB available memory which is 
> sufficient (for my setup) except one time I ran out of memory deleting old 
> snapshots. But 8GiB wouldn't have helped...

Exactly.  Filesystem swap can be a useful emergency tool, but in 2025 it should 
not be routine.  In 1985 a diskless Sun 2/50 with 3MB of physmem (yes, MB) 
needed swap, and that was SUPER fun over 10GE ethernet against a Fuji Eagle.

In years beginning with a 2, DRAM prices are such that if disabling swap causes 
a problem, then that’s a sign that you really, really need more physmem.

Swap is 12% of your virtual memory right now.  If you run hotter than 84% usage 
then really you need more.

By default the osd_memory_target autotune should be enabled, see what values it 
is setting.  By default it will divide 70% of physmem by however many OSDs are 
placed on a host:

# ceph config dump | grep osd_memory_target
osd                                       host:x  basic     osd_memory_target   
                       12060218196
osd                                       host:y  basic     osd_memory_target   
                       12062644163
osd                                       host:z  basic     osd_memory_target   
                       6614520783

Yes, the above cluster has lots of physmem, which is very very fortunate 
because it’s based on large HDDs and otherwise would have fallen over (if I 
told you the details, you wouldn’t sleep at night).  The money would have been 
better spent on QLC, but I digress.

If autotune isn’t on, the default osd_memory_target is 4GB.  Remember that it’s 
a target not a limit.  The docs advise a 20% headroom of available physmem, 
having suffered a few things I like to advise 50% at least, plus margin to run 
mons/mgrs/mds/etc.

> OSD containers consumes reasonable amount of RAM (~2.6GB - ~3.6 GB):

Actually that’s another sign that you may be starved unless these OSDs are 
rather idle.  Are you using cephadm, or something else to manage the 
containers?  Is it enforcing an artificial limit on them?

With 64GB of physmem cephadm’s autotuning would assign an osd_memory_target of 
4.5GB.

Memory allocation practice and accounting vary across kernel revisions, which 
may be a factor here.

What model of chassis are these?  Adding even 4x8GB super cheap DIMMs to each 
would do you a world of good, with more of course even better.  Be sure to not 
mix SKUs within a bank, and populate slots according to your motherboard’s 
documentation.


> 
> 
> 
>> -----Oorspronkelijk bericht-----
>> Van: Dmitrijs Demidovs <dmitrijs.demid...@carminered.eu>
>> Verzonden: vrijdag 23 mei 2025 10:16
>> Aan: ceph-users@ceph.io
>> Onderwerp: [ceph-users] Re: SWAP usage 100% on OSD hosts after
>> migration to Rocky Linux 9 (Ceph 16.2.15)
>> 
>> Hi Anthony.
>> 
>> Yes we have swap enabled. Old Rocky 8 and new Rocky 9 OSD hosts both
>> configured with 8G of swap.
>> 
>> I will try to disable swap, but I guess what we will get a lot of Out Of 
>> Memory
>> messages on OSD hosts.
>> 
>> 
>> 
>> = old:
>> [root@ceph-osd11 ~]# free -h
>>                total        used        free      shared buff/cache
>> available
>> Mem:           62Gi        30Gi       1.2Gi       2.1Gi 30Gi        29Gi
>> Swap:         8.0Gi       2.8Gi       5.2Gi
>> 
>> = new:
>> [root@ceph-osd17 ~]# free -h
>>                 total        used        free      shared buff/cache
>> available
>> Mem:            62Gi        26Gi       1.0Gi       1.0Gi 36Gi        36Gi
>> Swap:          8.0Gi       8.0Gi       7.0Mi
>> 
>> 
>> 
>> 
>> 
>> 
>> OSD containers consumes reasonable amount of RAM (~2.6GB - ~3.6 GB):
>> 
>> 
>> [root@ceph-osd17 ~]# docker stats --no-stream
>> CONTAINER ID   NAME
>>            CPU %     MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK
>> I/O         PIDS
>> 5cc58e4a77b2   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-52
>>            0.28%     3.576GiB / 62.28GiB   5.74%     0B / 0B   3.9TB /
>> 975GB     62
>> 3a60fecf648d   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-50
>>            0.28%     2.912GiB / 62.28GiB   4.68%     0B / 0B   100TB /
>> 45.7TB    62
>> 9c20407e79eb   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-49
>>            0.28%     2.905GiB / 62.28GiB   4.66%     0B / 0B   93TB /
>> 35.8TB     62
>> 9deadafef9dd   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-48
>>            0.56%     3.624GiB / 62.28GiB   5.82%     0B / 0B   102TB /
>> 39.2TB    62
>> fcfe62a25fd9   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-55
>>            0.40%     2.968GiB / 62.28GiB   4.77%     0B / 0B   83.2TB /
>> 34.8TB   62
>> 38d2d96cc491   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-51
>>            1.42%     2.666GiB / 62.28GiB   4.28%     0B / 0B   105TB /
>> 38.1TB    62
>> e29c6bbc1ae7   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-54
>>            2.01%     3.687GiB / 62.28GiB   5.92%     0B / 0B   106TB /
>> 44.6TB    62
>> 40346a7a45ea   ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-osd-53
>>            0.69%     2.748GiB / 62.28GiB   4.41%     0B / 0B   103TB /
>> 41.4TB    62
>> 43c3e3a65531
>> ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-crash-ceph-osd17
>> 0.00%     3.73MiB / 62.28GiB    0.01%     0B / 0B   567MB / 18MB      2
>> d9e436f9788c
>> ceph-7e8bff5c-2761-11ec-9bb0-000c29ebc936-node-exporter-ceph-osd17
>> 15.04%    30.25MiB / 62.28GiB   0.05%     0B / 0B   410MB / 14.6MB    61
>> 
>> 
>> 
>> 
>> 
>> But they also are biggest swap consumers:
>> 
>> [root@ceph-osd17 ~]# for file in /proc/*/status; do awk
>> '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 2
>> -n -r | more
>> ceph-osd 1553520 kB
>> ceph-osd 1447728 kB
>> ceph-osd 1218768 kB
>> ceph-osd 1117536 kB
>> ceph-osd 1026548 kB
>> ceph-osd 641632 kB
>> ceph-osd 495080 kB
>> ceph-osd 424392 kB
>> firewalld 26880 kB
>> dockerd 20352 kB
>> containerd 11136 kB
>> docker 6144 kB
>> docker 6144 kB
>> docker 5952 kB
>> docker 5952 kB
>> docker 5952 kB
>> docker 5952 kB
>> docker 5952 kB
>> docker 5760 kB
>> (sd-pam) 5184 kB
>> ceph-crash 4416 kB
>> python3 4224 kB
>> docker 4032 kB
>> systemd-udevd 3264 kB
>> 
>> 
>> 
>> 
>> 
>> 
>> On 22.05.2025 18:34, Anthony D'Atri wrote:
>>> 
>>>> 
>>>> Problem:
>>>> 
>>>> After migration to Rocky 9 (and new version of Docker) we see what our
>> OSD hosts consumes 100% of SWAP space! It takes approximately one week
>> to fill SWAP from 0% to 100%.
>>> 
>>> Why do you have swap configured at all?  I suggest disabling swap in fstab
>> and rebooting serially.
>>> 
>>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: SWAP usage 100% on OSD hosts after migration to Rocky Linux 9 (Ceph 16.2.15)

Reply via email to