I'm not sure this is a cache issue. To me, this feels like a memory leak. I'm now at 129GB (haven't had a window to upgrade yet) on a configured 80GB cache.
[root@mds0 ceph-admin]# ceph daemon mds.mds0 cache status { "pool": { "items": 166753076, "bytes": 71766944952 } } ran a 10 minute heap profile. [root@mds0 ceph-admin]# ceph tell mds.mds0 heap start_profiler 2018-05-25 08:15:04.428519 7f3f657fa700 0 client.127046191 ms_handle_reset on 10.124.103.50:6800/2248223690 2018-05-25 08:15:04.447528 7f3f667fc700 0 client.127055541 ms_handle_reset on 10.124.103.50:6800/2248223690 mds.mds0 started profiler [root@mds0 ceph-admin]# ceph tell mds.mds0 heap dump 2018-05-25 08:25:14.265450 7f1774ff9700 0 client.127057266 ms_handle_reset on 10.124.103.50:6800/2248223690 2018-05-25 08:25:14.356292 7f1775ffb700 0 client.127057269 ms_handle_reset on 10.124.103.50:6800/2248223690 mds.mds0 dumping heap profile now. ------------------------------------------------ MALLOC: 123658130320 (117929.6 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 6969713096 ( 6646.8 MiB) Bytes in central cache freelist MALLOC: + 26700832 ( 25.5 MiB) Bytes in transfer cache freelist MALLOC: + 54460040 ( 51.9 MiB) Bytes in thread cache freelists MALLOC: + 531034272 ( 506.4 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 131240038560 (125160.3 MiB) Actual memory used (physical + swap) MALLOC: + 7426875392 ( 7082.8 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 138666913952 (132243.1 MiB) Virtual address space used MALLOC: MALLOC: 7434952 Spans in use MALLOC: 20 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. [root@mds0 ceph-admin]# ceph tell mds.mds0 heap stop_profiler 2018-05-25 08:25:26.394877 7fbe48ff9700 0 client.127047898 ms_handle_reset on 10.124.103.50:6800/2248223690 2018-05-25 08:25:26.736909 7fbe49ffb700 0 client.127035608 ms_handle_reset on 10.124.103.50:6800/2248223690 mds.mds0 stopped profiler [root@mds0 ceph-admin]# pprof --pdf /bin/ceph-mds /var/log/ceph/mds.mds0.profile.000* > profile.pdf On Thu, May 10, 2018 at 2:11 PM, Patrick Donnelly <pdonn...@redhat.com> wrote: > On Thu, May 10, 2018 at 12:00 PM, Brady Deetz <bde...@gmail.com> wrote: > > [ceph-admin@mds0 ~]$ ps aux | grep ceph-mds > > ceph 1841 3.5 94.3 133703308 124425384 ? Ssl Apr04 1808:32 > > /usr/bin/ceph-mds -f --cluster ceph --id mds0 --setuser ceph --setgroup > ceph > > > > > > [ceph-admin@mds0 ~]$ sudo ceph daemon mds.mds0 cache status > > { > > "pool": { > > "items": 173261056, > > "bytes": 76504108600 > > } > > } > > > > So, 80GB is my configured limit for the cache and it appears the mds is > > following that limit. But, the mds process is using over 100GB RAM in my > > 128GB host. I thought I was playing it safe by configuring at 80. What > other > > things consume a lot of RAM for this process? > > > > Let me know if I need to create a new thread. > > The cache size measurement is imprecise pre-12.2.5 [1]. You should upgrade > ASAP. > > [1] https://tracker.ceph.com/issues/22972 > > -- > Patrick Donnelly >
profile.pdf
Description: Adobe PDF document
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com