What ceph sha1 is that? Does it include 6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger spin? -Sam
On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J <bryan.stillw...@charter.com> wrote: > On 1/10/17, 5:35 AM, "John Spray" <jsp...@redhat.com> wrote: > >>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J >><bryan.stillw...@charter.com> wrote: >>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a >>> single node, two OSD cluster, and after a while I noticed that the new >>> ceph-mgr daemon is frequently using a lot of the CPU: >>> >>> 17519 ceph 20 0 850044 168104 208 S 102.7 4.3 1278:27 >>> ceph-mgr >>> >>> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU >>> usage down to < 1%, but after a while it climbs back up to > 100%. Has >>> anyone else seen this? >> >>Definitely worth investigating, could you set "debug mgr = 20" on the >>daemon to see if it's obviously spinning in a particular place? > > I've injected that option to the ceps-mgr process, and now I'm just > waiting for it to go out of control again. > > However, I've noticed quite a few messages like this in the logs already: > > 2017-01-10 09:56:07.441678 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> > 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2 > cs=1 l=0).fault initiating reconnect > 2017-01-10 09:56:07.442044 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> > 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800 > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg > accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING > 2017-01-10 09:56:07.442067 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> > 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800 > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg > accept peer reset, then tried to connect to us, replacing > 2017-01-10 09:56:07.443026 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> > 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 > s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to > send and in the half accept state just closed > > > What's weird about that is that this is a single node cluster with > ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same > host. So none of the communication should be leaving the node. > > Bryan > > E-MAIL CONFIDENTIALITY NOTICE: > The contents of this e-mail message and any attachments are intended solely > for the addressee(s) and may contain confidential and/or legally privileged > information. If you are not the intended recipient of this message or if this > message has been addressed to you in error, please immediately alert the > sender by reply e-mail and then delete this message and any attachments. If > you are not the intended recipient, you are notified that any use, > dissemination, distribution, copying, or storage of this message or any > attachment is strictly prohibited. > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com