On Sat, Nov 9, 2013 at 7:53 AM, Mark Nelson <mark.nel...@inktank.com> wrote:
> One thing to try is run the mon and then attach to it with perf and see > what it's doing. If CPU usage is high and leveldb is doing tons of > compaction work that could indicate that this is the same or a similar > problem to what we were seeing back around cuttlefish. > I am sorry, I don't quite understand what does "attach to mon with perf" mean, so could you please elaborate how to do it? > Mark > > > On 11/08/2013 04:53 PM, Gregory Farnum wrote: > >> Hrm, there's nothing too odd in those dumps. I asked around and it >> sounds like the last time we saw this sort of strange memory use it >> was a result of leveldb not being able to compact quickly enough. Joao >> can probably help diagnose that faster than I can. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Fri, Nov 8, 2013 at 5:00 AM, Yu Changyuan <rei...@gmail.com> wrote: >> >>> I try to dump perf counter via admin socket, but I don't know what does >>> these numbers actual mean or does these numbers have any thing to do with >>> the different memory usage between arm and amd processors, so I attach >>> the >>> dump log as attachment(mon.a runs on AMD processor, mon.c runs on ARM >>> processor). >>> >>> PS: after days of running(mon.b is 6 day, mon.c is 3 day), the memory >>> consumption of both monitor running on arm board become stable, some what >>> about 600MB, here is the heap stats: >>> >>> mon.btcmalloc heap stats:------------------------ >>> ------------------------ >>> MALLOC: 594258992 ( 566.7 MiB) Bytes in use by application >>> MALLOC: + 19529728 ( 18.6 MiB) Bytes in page heap freelist >>> MALLOC: + 3885120 ( 3.7 MiB) Bytes in central cache freelist >>> MALLOC: + 6486528 ( 6.2 MiB) Bytes in transfer cache freelist >>> MALLOC: + 12202384 ( 11.6 MiB) Bytes in thread cache freelists >>> MALLOC: + 2889952 ( 2.8 MiB) Bytes in malloc metadata >>> MALLOC: ------------ >>> MALLOC: = 639252704 ( 609.6 MiB) Actual memory used (physical + swap) >>> MALLOC: + 122880 ( 0.1 MiB) Bytes released to OS (aka unmapped) >>> MALLOC: ------------ >>> MALLOC: = 639375584 ( 609.8 MiB) Virtual address space used >>> MALLOC: >>> MALLOC: 10231 Spans in use >>> MALLOC: 24 Thread heaps in use >>> >>> MALLOC: 8192 Tcmalloc page size >>> ------------------------------------------------ >>> Call ReleaseFreeMemory() to release freelist memory to the OS (via >>> madvise()). >>> Bytes released to the >>> >>> mon.ctcmalloc heap stats:------------------------ >>> ------------------------ >>> MALLOC: 593987584 ( 566.5 MiB) Bytes in use by application >>> MALLOC: + 23969792 ( 22.9 MiB) Bytes in page heap freelist >>> MALLOC: + 2172640 ( 2.1 MiB) Bytes in central cache freelist >>> MALLOC: + 5874688 ( 5.6 MiB) Bytes in transfer cache freelist >>> MALLOC: + 9268512 ( 8.8 MiB) Bytes in thread cache freelists >>> MALLOC: + 2889952 ( 2.8 MiB) Bytes in malloc metadata >>> MALLOC: ------------ >>> MALLOC: = 638163168 ( 608.6 MiB) Actual memory used (physical + swap) >>> MALLOC: + 163840 ( 0.2 MiB) Bytes released to OS (aka unmapped) >>> MALLOC: ------------ >>> MALLOC: = 638327008 ( 608.8 MiB) Virtual address space used >>> MALLOC: >>> MALLOC: 9796 Spans in use >>> >>> MALLOC: 14 Thread heaps in use >>> MALLOC: 8192 Tcmalloc page size >>> ------------------------------------------------ >>> Call ReleaseFreeMemory() to release freelist memory to the OS (via >>> madvise()). >>> Bytes released to the >>> >>> >>> >>> >>> On Fri, Nov 8, 2013 at 12:03 PM, Gregory Farnum <g...@inktank.com> >>> wrote: >>> >>>> >>>> I don't think this is anything we've observed before. Normally when a >>>> Ceph node is using more memory than its peers it's a consequence of >>>> something in that node getting backed up. You might try looking at the >>>> perf counters via the admin socket and seeing if something about them >>>> is different between your ARM and AMD processors. >>>> -Greg >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>> >>>> >>>> On Tue, Nov 5, 2013 at 7:21 AM, Yu Changyuan <rei...@gmail.com> wrote: >>>> >>>>> Finally, my tiny ceph cluster get 3 monitors, newly added mon.b and >>>>> mon.c >>>>> both running on cubieboard2, which is cheap but still with enough cpu >>>>> power(dual-core arm A7 cpu, 1.2G) and memory(1G). >>>>> >>>>> But compare to mon.a which running on an amd64 cpu, both mon.b and >>>>> mon.c >>>>> easily consume too much memory, so I want to know whether this is >>>>> caused >>>>> by >>>>> memory leak. Below is the output of 'ceph tell mon.a heap stats' and >>>>> 'ceph >>>>> tell mon.c heap stats'(mon.c only start 12hr ago, while mon.a already >>>>> running for more than 10 days) >>>>> >>>>> mon.atcmalloc heap >>>>> stats:------------------------------------------------ >>>>> MALLOC: 5480160 ( 5.2 MiB) Bytes in use by application >>>>> MALLOC: + 28065792 ( 26.8 MiB) Bytes in page heap freelist >>>>> MALLOC: + 15242312 ( 14.5 MiB) Bytes in central cache freelist >>>>> MALLOC: + 10116608 ( 9.6 MiB) Bytes in transfer cache freelist >>>>> MALLOC: + 10432216 ( 9.9 MiB) Bytes in thread cache freelists >>>>> MALLOC: + 1667224 ( 1.6 MiB) Bytes in malloc metadata >>>>> MALLOC: ------------ >>>>> MALLOC: = 71004312 ( 67.7 MiB) Actual memory used (physical + >>>>> swap) >>>>> MALLOC: + 57540608 ( 54.9 MiB) Bytes released to OS (aka >>>>> unmapped) >>>>> MALLOC: ------------ >>>>> MALLOC: = 128544920 ( 122.6 MiB) Virtual address space used >>>>> MALLOC: >>>>> MALLOC: 4655 Spans in use >>>>> MALLOC: 34 Thread heaps in use >>>>> MALLOC: 8192 Tcmalloc page size >>>>> ------------------------------------------------ >>>>> Call ReleaseFreeMemory() to release freelist memory to the OS (via >>>>> madvise()). >>>>> Bytes released to the >>>>> >>>>> >>>>> mon.ctcmalloc heap >>>>> stats:------------------------------------------------ >>>>> MALLOC: 175861640 ( 167.7 MiB) Bytes in use by application >>>>> MALLOC: + 2220032 ( 2.1 MiB) Bytes in page heap freelist >>>>> MALLOC: + 1007560 ( 1.0 MiB) Bytes in central cache freelist >>>>> MALLOC: + 2871296 ( 2.7 MiB) Bytes in transfer cache freelist >>>>> MALLOC: + 4686000 ( 4.5 MiB) Bytes in thread cache freelists >>>>> MALLOC: + 2758880 ( 2.6 MiB) Bytes in malloc metadata >>>>> MALLOC: ------------ >>>>> MALLOC: = 189405408 ( 180.6 MiB) Actual memory used (physical + >>>>> swap) >>>>> MALLOC: + 0 ( 0.0 MiB) Bytes released to OS (aka >>>>> unmapped) >>>>> MALLOC: ------------ >>>>> MALLOC: = 189405408 ( 180.6 MiB) Virtual address space used >>>>> MALLOC: >>>>> MALLOC: 3445 Spans in use >>>>> MALLOC: 14 Thread heaps in use >>>>> MALLOC: 8192 Tcmalloc page size >>>>> ------------------------------------------------ >>>>> Call ReleaseFreeMemory() to release freelist memory to the OS (via >>>>> madvise()). >>>>> Bytes released to the >>>>> >>>>> The ceph versin is 0.67.4, compiled with tcmalloc enabled, >>>>> gcc(armv7a-hardfloat-linux-gnueabi-gcc) version 4.7.3 and I also try >>>>> to >>>>> dump >>>>> heap, but I can not find anything useful, below is a recent dump, >>>>> output >>>>> by >>>>> command "pprof --text /usr/bin/ceph-mon mon.c.profile.0021.heap". What >>>>> extra >>>>> step should I take to make the dump more meaningful? >>>>> >>>>> Using local file /usr/bin/ceph-mon. >>>>> Using local file mon.c.profile.0021.heap. >>>>> Total: 149.3 MB >>>>> 146.2 97.9% 97.9% 146.2 97.9% 00000000b6a7ce7c >>>>> 1.4 0.9% 98.9% 1.4 0.9% >>>>> std::basic_string::_Rep::_S_create >>>>> ??:0 >>>>> 1.4 0.9% 99.8% 1.4 0.9% 00000000002dd794 >>>>> 0.1 0.1% 99.9% 0.1 0.1% 00000000b6a81170 >>>>> 0.1 0.1% 99.9% 0.1 0.1% 00000000b6a80894 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7e2ac >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a81410 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000367450 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000001d4474 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000028847c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7e8d8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000020c80c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000028bd20 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a63248 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a83478 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a806f0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000002eb8b8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000024efb4 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000027e550 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a77104 >>>>> 0.0 0.0% 100.0% 0.0 0.0% _dl_mcount ??:0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000003673ec >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7a91c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000295e44 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7ee38 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000283948 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000002a53c4 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7665c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000002c4590 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7e88c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a8456c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a76ed4 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a842f0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a72bd0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a73cf8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7100c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7dec4 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000035e6e8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a78f68 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7de9c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000220528 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000035e7c0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a6b2f8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a80a04 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a62e7c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a66f50 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a7e958 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a6cfb8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% leveldb::DBImpl:: >>>>> MakeRoomForWrite >>>>> (inline) ??:0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000020797c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a69de0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000001d0af0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000001d0ebc >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000002a0cd4 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000036909c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000040b02c >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000001d0b68 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000392fa0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a64404 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a791b4 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000001d9824 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000213928 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000002a0cb8 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000002a4fcc >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a725ac >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a66308 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a79068 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000000013b2 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000040b000 >>>>> 0.0 0.0% 100.0% 0.1 0.1% 00000000004d839b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f29887 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f3eb6b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f6e1cb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f6edab >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f873ab >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f8a26b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f8b0cb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f92dab >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000f9c96b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fa24bf >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fadd8b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fb06ab >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fb0d0b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fb494b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fbad6b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fbb2cb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fbea6b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fed0eb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000000fed69b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000129920b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000014250eb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000166cfc5 >>>>> 0.0 0.0% 100.0% 0.1 0.1% 000000000166d711 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000003531d2b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000379adbb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000004e888fb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000004e894ab >>>>> 0.0 0.0% 100.0% 0.0 0.0% 0000000004e8951b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000060146d3 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000601482f >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000060fcd2b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000060fd33b >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000060fdfbb >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000a820749 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 000000000bfb1950 >>>>> 0.0 0.0% 100.0% 0.0 0.0% 00000000b6a43f23 >>>>> 0.0 0.0% 100.0% 0.0 0.0% __clone ??:0 >>>>> 0.0 0.0% 100.0% 0.0 0.0% leveldb::DBImpl:: >>>>> MakeRoomForWrite >>>>> ??:0 >>>>> 0.0 0.0% 100.0% 0.2 0.1% std::num_put::do_put@806e4??:0 >>>>> 0.0 0.0% 100.0% 0.4 0.2% std::num_put::do_put@80b44??:0 >>>>> 0.0 0.0% 100.0% 0.1 0.1% std::num_put::do_put@80e00??:0 >>>>> >>>>> >>>>> PS: there's a cubietruck >>>>> board(http://docs.cubieboard.org/products/start#cubietruck_cubieboard3 >>>>> ) >>>>> released recently, which features a dual-core arm A7 cpu, 2G RAM, 1Gbit >>>>> eth >>>>> port, and a sata 2.0 port, for $89, maybe suitable for cheap dedicate >>>>> osd >>>>> server with single disk. >>>>> >>>>> -- >>>>> Best regards, >>>>> Changyuan >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>> >>> >>> >>> -- >>> Best regards, >>> Changyuan >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Best regards, Changyuan
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com