Samuel, 
  
Hard to tell for sure since this bug hit different major versions of the 
kernel, at least RHEL's from what I know. The only way to tell is to check for 
num_cgroups in /proc/cgroups:

 
 
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t 
   #subsys_name  hierarchy  num_cgroups  enabled 
   blkio         4          1099         1  
Otherwise, you'd have to check the sources of the kernel you're using against 
the patch that fixed this bug. Unfortunately, I can't spot the upstream patch 
that fixed this issue since RH BZs related to this bug are private. Maybe 
someone here can spot it. 
   
 
Regards, 
Frédéric.  

  

-----Message original-----

De: huxiaoyu <huxia...@horebdata.cn>
à: Frédéric <frederic.n...@univ-lorraine.fr>
Cc: ceph-users <ceph-users@ceph.io>
Envoyé: vendredi 12 janvier 2024 09:25 CET
Sujet : Re: Re: [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

 
Dear Frederic, 
  
Thanks a lot for the suggestions. We are using the valilla Linux 4.19 LTS 
version. Do you think we may be suffering from the same bug? 
  
best regards, 
  
Samuel 
  
   huxia...@horebdata.cn        From: Frédéric Nass Date: 2024-01-12 09:19 To: 
huxiaoyu CC: ceph-users Subject: Re: [ceph-users] Ceph Nautilous 14.2.22 slow 
OSD memory leak?      Hello,   We've had a similar situation recently where 
OSDs would use way more memory than osd_memory_target and get OOM killed by the 
kernel. It was due to a kernel bug related to cgroups [1].   If num_cgroups 
below keeps increasing then you may hit this bug.
 
  
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t 
   #subsys_name  hierarchy  num_cgroups  enabled 
   blkio         4          1099         1 
  
If you hit this bug, upgrading OSDs nodes kernels should get you through. If 
you can't access the Red Hat KB [1], let me know your current nodes kernel 
version and I'll check for you. 
  Regards,
Frédéric. 
 
 
  
[1] https://access.redhat.com/solutions/7014337     
De: huxiaoyu <huxia...@horebdata.cn>
à: ceph-users <ceph-users@ceph.io>
Envoyé: mercredi 10 janvier 2024 19:21 CET
Sujet : [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

Dear Ceph folks, 

I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one 
with replication 3, and the other with EC 4+2. After around 400 days runing 
quietly and smoothly, recently the two clusters occured with similar problems: 
some of OSDs consume ca 18 GB while the memory target is setting at 2GB. 

What could wrong in the background? Does it mean any slow OSD memory leak 
issues with 14.2.22 which i do not know yet? 

I would be highly appreciated if some some provides any clues, ideas, comments 
...... 

best regards, 

Samuel 



huxia...@horebdata.cn 
_______________________________________________ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io        
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to