likely caused by http://tracker.ceph.com/issues/37399.
Regards Yan, Zheng On Sat, Jan 5, 2019 at 5:44 PM Matthias Aebi <ma...@dizmo.com> wrote: > > Hello everyone, > > We are running a small cluster on 5 machines with 48 OSDs / 5 MDSs / 5 MONs > based on Luminous 12.2.10 and Debian Stretch 9.6. When using a single MDS > configuration everything works fine and looking at the active MDS's memory, > as configured, it uses ~1 GByte of memory for cache: > > $ watch ceph tell mds.$(hostname) heap stats > > mds.e tcmalloc heap stats:------------------------------------------------ > MALLOC: 1172867096 ( 1118.5 MiB) Bytes in use by application > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist > MALLOC: + 39289912 ( 37.5 MiB) Bytes in central cache freelist > MALLOC: + 17245344 ( 16.4 MiB) Bytes in transfer cache freelist > MALLOC: + 34303760 ( 32.7 MiB) Bytes in thread cache freelists > MALLOC: + 5796032 ( 5.5 MiB) Bytes in malloc metadata > MALLOC: ------------ > MALLOC: = 1269502144 ( 1210.7 MiB) Actual memory used (physical + swap) > MALLOC: + 19775488 ( 18.9 MiB) Bytes released to OS (aka unmapped) > MALLOC: ------------ > MALLOC: = 1289277632 ( 1229.6 MiB) Virtual address space used > MALLOC: > MALLOC: 70430 Spans in use > MALLOC: 17 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > ------------- > $ ceph versions > > { > "mon": { > "ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) > luminous (stable)": 5 > }, > "mgr": { > "ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) > luminous (stable)": 3 > }, > "osd": { > "ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) > luminous (stable)": 48 > }, > "mds": { > "ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) > luminous (stable)": 5 > }, > "overall": { > "ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) > luminous (stable)": 61 > } > > ------------- > $ ceph -s > > cluster: > id: .... c9024 > health: HEALTH_OK > > services: > mon: 5 daemons, quorum a,b,c,d,e > mgr: libra(active), standbys: b, a > mds: cephfs-1/1/1 up {0=e=up:active}, 1 up:standby-replay, 3 up:standby > osd: 48 osds: 48 up, 48 in > > data: > pools: 2 pools, 2052 pgs > objects: 44.44M objects, 52.3TiB > usage: 107TiB used, 108TiB / 216TiB avail > pgs: 2051 active+clean > 1 active+clean+scrubbing+deep > > io: > client: 85.3KiB/s rd, 3.17MiB/s wr, 45op/s rd, 26op/s wr > ------------- > > However as soon as we use "ceph fs set cephfs max_mds 2" to add a second MDS > to the picture things get out of hand within seconds, although in a rather > unexpected way: The standby MDS server which is brought in works fine and > shown a normal memory consumption. However the two machines which are > starting to replay the journal in order to become standby servers start to > accumulate dozens of GByte of memory immediately and go up to about 150 GByte > of memory, almost immediately starting to use swap space, which brings load > up to about 80 within seconds and makes all other processes (mainly OSDs) > unreachable. > > As the machine becomes basically unreachable when this happens it is only > possible to get memory statistics when things start to wrong. After that it's > not possible to get a memory dump anymore as the OS as a whole gets blocked > by swapping. > > $ watch ceph tell mds.$(hostname) heap stats > > mds.a tcmalloc heap stats:------------------------------------------------ > MALLOC: 36113137024 (34440.2 MiB) Bytes in use by application > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist > MALLOC: + 7723144 ( 7.4 MiB) Bytes in central cache freelist > MALLOC: + 2523264 ( 2.4 MiB) Bytes in transfer cache freelist > MALLOC: + 2460024 ( 2.3 MiB) Bytes in thread cache freelists > MALLOC: + 41185472 ( 39.3 MiB) Bytes in malloc metadata > MALLOC: ------------ > MALLOC: = 36167028928 (34491.6 MiB) Actual memory used (physical + swap) > MALLOC: + 1417216 ( 1.4 MiB) Bytes released to OS (aka unmapped) > MALLOC: ------------ > MALLOC: = 36168446144 (34492.9 MiB) Virtual address space used > MALLOC: > MALLOC: 38476 Spans in use > MALLOC: 13 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > ------------- > > Please also find attached the zip'ed log file of one of the two new standby > MDSs when it is trying to replay the fs journal. > > As soon as the number of MDSs is set back to 1 (using "ceph fs set cephfs > max_mds 1" and "ceph mds deactivate 1") things start to calm down and the > cluster goes back to normal. Is this a known problem with Luminous and what > can be done to solve it anyway so the multi MDS feature may be used? > > As all servers used here are based on Debian it is unfortunately not possible > to upgrade to Mimic as it seems that this cannot be / will not be made > available for Debian Stretch due to the tool chain issue described elsewhere. > > Thank you for any help and pointers in the right direction! > > Best, > Matthias > > ---------------------------------------------------------------------------------------------------- > dizmo - The Interface of Things > http://www.dizmo.com, Phone +41 52 267 88 50, Twitter @dizmos > dizmo inc, Universitätsstrasse 53, CH-8006 Zurich, Switzerland > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com