Hi Bryan,

Did you ever learn more about this, or see it again?
I'm facing 100% ceph-mon CPU usage now, and putting my observations
here: https://tracker.ceph.com/issues/42830
Cheers, Dan

On Mon, Dec 16, 2019 at 10:58 PM Bryan Stillwell <bstillw...@godaddy.com> wrote:
>
> Sasha,
>
> I was able to get past it by restarting the ceph-mon processes every time it 
> got stuck, but that's not a very good solution for a production cluster.
>
> Right now I'm trying to narrow down what is causing the problem.  Rebuilding 
> the OSDs with BlueStore doesn't seem to be enough.  I believe it could be 
> related to us using the extra space on the journal device as an SSD-based 
> OSD.  During the conversion process I'm removing this SSD-based OSD (since 
> with BlueStore the omap data is ending up on the SSD anyways), and I'm 
> suspecting it might be causing this problem.
>
> Bryan
>
> On Dec 14, 2019, at 10:27 AM, Sasha Litvak <alexander.v.lit...@gmail.com> 
> wrote:
>
> Notice: This email is from an external sender.
>
> Bryan,
>
> Were you able to resolve this?  If yes, can you please share with the list?
>
> On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell <bstillw...@godaddy.com> 
> wrote:
>>
>> Adding the dev list since it seems like a bug in 14.2.5.
>>
>> I was able to capture the output from perf top:
>>
>>   21.58%  libceph-common.so.0               [.] 
>> ceph::buffer::v14_2_0::list::append
>>   20.90%  libstdc++.so.6.0.19               [.] std::getline<char, 
>> std::char_traits<char>, std::allocator<char> >
>>   13.25%  libceph-common.so.0               [.] 
>> ceph::buffer::v14_2_0::list::append
>>   10.11%  libstdc++.so.6.0.19               [.] std::istream::sentry::sentry
>>    8.94%  libstdc++.so.6.0.19               [.] std::basic_ios<char, 
>> std::char_traits<char> >::clear
>>    3.24%  libceph-common.so.0               [.] 
>> ceph::buffer::v14_2_0::ptr::unused_tail_length
>>    1.69%  libceph-common.so.0               [.] std::getline<char, 
>> std::char_traits<char>, std::allocator<char> >@plt
>>    1.63%  libstdc++.so.6.0.19               [.] 
>> std::istream::sentry::sentry@plt
>>    1.21%  [kernel]                          [k] __do_softirq
>>    0.77%  libpython2.7.so.1.0               [.] PyEval_EvalFrameEx
>>    0.55%  [kernel]                          [k] _raw_spin_unlock_irqrestore
>>
>> I increased mon debugging to 20 and nothing stuck out to me.
>>
>> Bryan
>>
>> > On Dec 12, 2019, at 4:46 PM, Bryan Stillwell <bstillw...@godaddy.com> 
>> > wrote:
>> >
>> > On our test cluster after upgrading to 14.2.5 I'm having problems with the 
>> > mons pegging a CPU core while moving data around.  I'm currently 
>> > converting the OSDs from FileStore to BlueStore by marking the OSDs out in 
>> > multiple nodes, destroying the OSDs, and then recreating them with 
>> > ceph-volume lvm batch.  This seems too get the ceph-mon process into a 
>> > state where it pegs a CPU core on one of the mons:
>> >
>> > 1764450 ceph      20   0 4802412   2.1g  16980 S 100.0 28.1   4:54.72 
>> > ceph-mon
>> >
>> > Has anyone else run into this with 14.2.5 yet?  I didn't see this problem 
>> > while the cluster was running 14.2.4.
>> >
>> > Thanks,
>> > Bryan
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to