I'm not sure this is a cache issue. To me, this feels like a memory leak.
I'm now at 129GB (haven't had a window to upgrade yet) on a configured 80GB
cache.

[root@mds0 ceph-admin]# ceph daemon mds.mds0 cache status
{
    "pool": {
        "items": 166753076,
        "bytes": 71766944952
    }
}


ran a 10 minute heap profile.

[root@mds0 ceph-admin]# ceph tell mds.mds0 heap start_profiler
2018-05-25 08:15:04.428519 7f3f657fa700  0 client.127046191 ms_handle_reset
on 10.124.103.50:6800/2248223690
2018-05-25 08:15:04.447528 7f3f667fc700  0 client.127055541 ms_handle_reset
on 10.124.103.50:6800/2248223690
mds.mds0 started profiler


[root@mds0 ceph-admin]# ceph tell mds.mds0 heap dump
2018-05-25 08:25:14.265450 7f1774ff9700  0 client.127057266 ms_handle_reset
on 10.124.103.50:6800/2248223690
2018-05-25 08:25:14.356292 7f1775ffb700  0 client.127057269 ms_handle_reset
on 10.124.103.50:6800/2248223690
mds.mds0 dumping heap profile now.
------------------------------------------------
MALLOC:   123658130320 (117929.6 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +   6969713096 ( 6646.8 MiB) Bytes in central cache freelist
MALLOC: +     26700832 (   25.5 MiB) Bytes in transfer cache freelist
MALLOC: +     54460040 (   51.9 MiB) Bytes in thread cache freelists
MALLOC: +    531034272 (  506.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: = 131240038560 (125160.3 MiB) Actual memory used (physical + swap)
MALLOC: +   7426875392 ( 7082.8 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: = 138666913952 (132243.1 MiB) Virtual address space used
MALLOC:
MALLOC:        7434952              Spans in use
MALLOC:             20              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.

[root@mds0 ceph-admin]# ceph tell mds.mds0 heap stop_profiler
2018-05-25 08:25:26.394877 7fbe48ff9700  0 client.127047898 ms_handle_reset
on 10.124.103.50:6800/2248223690
2018-05-25 08:25:26.736909 7fbe49ffb700  0 client.127035608 ms_handle_reset
on 10.124.103.50:6800/2248223690
mds.mds0 stopped profiler

[root@mds0 ceph-admin]# pprof --pdf /bin/ceph-mds
/var/log/ceph/mds.mds0.profile.000* > profile.pdf



On Thu, May 10, 2018 at 2:11 PM, Patrick Donnelly <pdonn...@redhat.com>
wrote:

> On Thu, May 10, 2018 at 12:00 PM, Brady Deetz <bde...@gmail.com> wrote:
> > [ceph-admin@mds0 ~]$ ps aux | grep ceph-mds
> > ceph        1841  3.5 94.3 133703308 124425384 ? Ssl  Apr04 1808:32
> > /usr/bin/ceph-mds -f --cluster ceph --id mds0 --setuser ceph --setgroup
> ceph
> >
> >
> > [ceph-admin@mds0 ~]$ sudo ceph daemon mds.mds0 cache status
> > {
> >     "pool": {
> >         "items": 173261056,
> >         "bytes": 76504108600
> >     }
> > }
> >
> > So, 80GB is my configured limit for the cache and it appears the mds is
> > following that limit. But, the mds process is using over 100GB RAM in my
> > 128GB host. I thought I was playing it safe by configuring at 80. What
> other
> > things consume a lot of RAM for this process?
> >
> > Let me know if I need to create a new thread.
>
> The cache size measurement is imprecise pre-12.2.5 [1]. You should upgrade
> ASAP.
>
> [1] https://tracker.ceph.com/issues/22972
>
> --
> Patrick Donnelly
>

Attachment: profile.pdf
Description: Adobe PDF document

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to