Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

Samuel Just Tue, 10 Jan 2017 09:24:13 -0800

What ceph sha1 is that?  Does it include
6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
spin?
-Sam


On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
<bryan.stillw...@charter.com> wrote:
> On 1/10/17, 5:35 AM, "John Spray" <jsp...@redhat.com> wrote:
>
>>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
>><bryan.stillw...@charter.com> wrote:
>>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
>>> single node, two OSD cluster, and after a while I noticed that the new
>>> ceph-mgr daemon is frequently using a lot of the CPU:
>>>
>>> 17519 ceph      20   0  850044 168104    208 S 102.7  4.3   1278:27
>>> ceph-mgr
>>>
>>> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU
>>> usage down to < 1%, but after a while it climbs back up to > 100%.  Has
>>> anyone else seen this?
>>
>>Definitely worth investigating, could you set "debug mgr = 20" on the
>>daemon to see if it's obviously spinning in a particular place?
>
> I've injected that option to the ceps-mgr process, and now I'm just
> waiting for it to go out of control again.
>
> However, I've noticed quite a few messages like this in the logs already:
>
> 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2
> cs=1 l=0).fault initiating reconnect
> 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
> 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> accept peer reset, then tried to connect to us, replacing
> 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to
> send and in the half  accept state just closed
>
>
> What's weird about that is that this is a single node cluster with
> ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
> host.  So none of the communication should be leaving the node.
>
> Bryan
>
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended solely 
> for the addressee(s) and may contain confidential and/or legally privileged 
> information. If you are not the intended recipient of this message or if this 
> message has been addressed to you in error, please immediately alert the 
> sender by reply e-mail and then delete this message and any attachments. If 
> you are not the intended recipient, you are notified that any use, 
> dissemination, distribution, copying, or storage of this message or any 
> attachment is strictly prohibited.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

Reply via email to