Hi everyone,

yesterday evening one of my rgw nodes died again, radosgw was killed by the 
kernel oom killer.

[Thu Aug 12 22:10:04 2021] Out of memory: Killed process 1376 (radosgw) 
total-vm:70747176kB, anon-rss:63900544kB, file-rss:0kB, shmem-rss:0kB, UID:167 
pgtables:131008kB oom_score_adj:0
[Thu Aug 12 22:10:09 2021] oom_reaper: reaped process 1376 (radosgw), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The radosgw was eating up all the 64GB system memory.
A few hours before this happened, mempool dump showed a total usage of only 2.1 
GB of ram, while in fact radosgw was using already 84.7% of 64GB.

        "total": {
            "items": 88757980,
            "bytes": 2147532284

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
                                                                                
                                                                                
               
   1376 ceph      20   0   58.8g  52.7g  17824 S  48.2  84.7  20158:04 radosgw


It seems the radowgw loses track of some memory, like there is a memory leak.

Some additional information. I am running on CentOS 8.4, kernel 4.18. As 
already mentioned, Ceph 14.2.22. radosgw is the only notable service running on 
this machine.
Any suggestions on this? Are there maybe any tuning settings? How could I debug 
this further?

________________________________________
Von: Martin Traxl <martin.tr...@1und1.de>
Gesendet: Dienstag, 10. August 2021 15:15
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: RGW memory consumption

I should mention we are using the S3 interface, so it is S3 traffic.

________________________________________
Von: Martin Traxl <martin.tr...@1und1.de>
Gesendet: Dienstag, 10. August 2021 14:35
An: ceph-users@ceph.io
Betreff: [ceph-users] RGW memory consumption

Hi everyone,

we are running a ceph nautilus 14.2.22 cluster with 6 osd-nodes (each 12x8TB 
hdd, 2x ssd for rgw index pool), 3 mon-nodes and 3 dedicated rgw-nodes. The 
rgw-nodes have 64GB ram each. We have a job running every day at 4 am. It is 
running for about 1 hour, it is read only, read load is peaking up to about 
970MB/s (megabyte per second). This is causing an additional ~5% of ram usage 
every day on the rgw-nodes. This ram usage is never freed up. So we have to 
restart the rgws regularly. Maybe there is some kind of memory leak ongoing.

For example right now on one of my rgw-nodes:


$ free -m
              total        used        free      shared  buff/cache   available
Mem:          63686       44084        6363           4       13238       18887

top tells me, radosgw is using 67.8% of the machines memory.

A mempool dump on the radosgw shows me

        "total": {
            "items": 71162589,
            "bytes": 1712643376

which is only 1,7 Gbyte. Almost all of it used as "buffer_anon".


Are there any setting that might help tuning the memory consumption? Do you 
need further information about my setup?



Thank you,
Martin

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to