The ceph-mon is already taking a lot of memory, and I ran a heap stats
------------------------------------------------
MALLOC:       32391696 (   30.9 MiB) Bytes in use by application
MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap freelist
MALLOC: +     16598552 (   15.8 MiB) Bytes in central cache freelist
MALLOC: +     14693536 (   14.0 MiB) Bytes in transfer cache freelist
MALLOC: +     17441592 (   16.6 MiB) Bytes in thread cache freelists
MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  27794649240 (26507.0 MiB) Actual memory used (physical + swap)
MALLOC: +     26116096 (   24.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
MALLOC:
MALLOC:           5683              Spans in use
MALLOC:             21              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------

after that I ran the heap release and it went back to normal.
------------------------------------------------
MALLOC:       22919616 (   21.9 MiB) Bytes in use by application
MALLOC: +      4792320 (    4.6 MiB) Bytes in page heap freelist
MALLOC: +     18743448 (   17.9 MiB) Bytes in central cache freelist
MALLOC: +     20645776 (   19.7 MiB) Bytes in transfer cache freelist
MALLOC: +     18456088 (   17.6 MiB) Bytes in thread cache freelists
MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    201945240 (  192.6 MiB) Actual memory used (physical + swap)
MALLOC: +  27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
MALLOC:
MALLOC:           5639              Spans in use
MALLOC:             29              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------

So it just seems the monitor is not returning unused memory into the OS or
reusing already allocated memory it deems as free...


On Wed, Jul 22, 2015 at 4:29 PM, Luis Periquito <periqu...@gmail.com> wrote:

> This cluster is server RBD storage for openstack, and today all the I/O
> was just stopped.
> After looking in the boxes ceph-mon was using 17G ram - and this was on
> *all* the mons. Restarting the main one just made it work again (I
> restarted the other ones because they were using a lot of ram).
> This has happened twice now (first was last Monday).
>
> As this is considered a prod cluster there is no logging enabled, and I
> can't reproduce it - our test/dev clusters have been working fine, and have
> neither symptoms, but they were upgraded from firefly.
> What can we do to help debug the issue? Any ideas on how to identify the
> underlying issue?
>
> thanks,
>
> On Mon, Jul 20, 2015 at 1:59 PM, Luis Periquito <periqu...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I have a cluster with 28 nodes (all physical, 4Cores, 32GB Ram), each
>> node has 4 OSDs for a total of 112 OSDs. Each OSD has 106 PGs (counted
>> including replication). There are 3 MONs on this cluster.
>> I'm running on Ubuntu trusty with kernel 3.13.0-52-generic, with Hammer
>> (0.94.2).
>>
>> This cluster was installed with Hammer (0.94.1) and has only been
>> upgraded to the latest available version.
>>
>> On the three mons one is mostly idle, one is using ~170% CPU, and one is
>> using ~270% CPU. They will change as I restart the process (usually the
>> idle one is the one with the lowest uptime).
>>
>> Running a perf top againt the ceph-mon PID on the non-idle boxes it
>> wields something like this:
>>
>>   4.62%  libpthread-2.19.so    [.] pthread_mutex_unlock
>>   3.95%  libpthread-2.19.so    [.] pthread_mutex_lock
>>   3.91%  libsoftokn3.so        [.] 0x000000000001db26
>>   2.38%  [kernel]              [k] _raw_spin_lock
>>   2.09%  libtcmalloc.so.4.1.2  [.] operator new(unsigned long)
>>   1.79%  ceph-mon              [.] DispatchQueue::enqueue(Message*, int,
>> unsigned long)
>>   1.62%  ceph-mon              [.] RefCountedObject::get()
>>   1.58%  libpthread-2.19.so    [.] pthread_mutex_trylock
>>   1.32%  libtcmalloc.so.4.1.2  [.] operator delete(void*)
>>   1.24%  libc-2.19.so          [.] 0x0000000000097fd0
>>   1.20%  ceph-mon              [.] ceph::buffer::ptr::release()
>>   1.18%  ceph-mon              [.] RefCountedObject::put()
>>   1.15%  libfreebl3.so         [.] 0x00000000000542a8
>>   1.05%  [kernel]              [k] update_cfs_shares
>>   1.00%  [kernel]              [k] tcp_sendmsg
>>
>> The cluster is mostly idle, and it's healthy. The store is 69MB big, and
>> the MONs are consuming around 700MB of RAM.
>>
>> Any ideas on this situation? Is it safe to ignore?
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to