Hi Bryan, Did you ever learn more about this, or see it again? I'm facing 100% ceph-mon CPU usage now, and putting my observations here: https://tracker.ceph.com/issues/42830
Cheers, Dan On Mon, Dec 16, 2019 at 10:58 PM Bryan Stillwell <bstillw...@godaddy.com> wrote: > > Sasha, > > I was able to get past it by restarting the ceph-mon processes every time it > got stuck, but that's not a very good solution for a production cluster. > > Right now I'm trying to narrow down what is causing the problem. Rebuilding > the OSDs with BlueStore doesn't seem to be enough. I believe it could be > related to us using the extra space on the journal device as an SSD-based > OSD. During the conversion process I'm removing this SSD-based OSD (since > with BlueStore the omap data is ending up on the SSD anyways), and I'm > suspecting it might be causing this problem. > > Bryan > > On Dec 14, 2019, at 10:27 AM, Sasha Litvak <alexander.v.lit...@gmail.com> > wrote: > > Notice: This email is from an external sender. > > Bryan, > > Were you able to resolve this? If yes, can you please share with the list? > > On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell <bstillw...@godaddy.com> > wrote: >> >> Adding the dev list since it seems like a bug in 14.2.5. >> >> I was able to capture the output from perf top: >> >> 21.58% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::list::append >> 20.90% libstdc++.so.6.0.19 [.] std::getline<char, >> std::char_traits<char>, std::allocator<char> > >> 13.25% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::list::append >> 10.11% libstdc++.so.6.0.19 [.] std::istream::sentry::sentry >> 8.94% libstdc++.so.6.0.19 [.] std::basic_ios<char, >> std::char_traits<char> >::clear >> 3.24% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::unused_tail_length >> 1.69% libceph-common.so.0 [.] std::getline<char, >> std::char_traits<char>, std::allocator<char> >@plt >> 1.63% libstdc++.so.6.0.19 [.] >> std::istream::sentry::sentry@plt >> 1.21% [kernel] [k] __do_softirq >> 0.77% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx >> 0.55% [kernel] [k] _raw_spin_unlock_irqrestore >> >> I increased mon debugging to 20 and nothing stuck out to me. >> >> Bryan >> >> > On Dec 12, 2019, at 4:46 PM, Bryan Stillwell <bstillw...@godaddy.com> >> > wrote: >> > >> > On our test cluster after upgrading to 14.2.5 I'm having problems with the >> > mons pegging a CPU core while moving data around. I'm currently >> > converting the OSDs from FileStore to BlueStore by marking the OSDs out in >> > multiple nodes, destroying the OSDs, and then recreating them with >> > ceph-volume lvm batch. This seems too get the ceph-mon process into a >> > state where it pegs a CPU core on one of the mons: >> > >> > 1764450 ceph 20 0 4802412 2.1g 16980 S 100.0 28.1 4:54.72 >> > ceph-mon >> > >> > Has anyone else run into this with 14.2.5 yet? I didn't see this problem >> > while the cluster was running 14.2.4. >> > >> > Thanks, >> > Bryan >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io