[ceph-users] Re: ceph-mon using 100% CPU after upgrade to 14.2.5

Sasha Litvak Mon, 16 Dec 2019 16:52:31 -0800

Bryan, thank you.  We are about to start testing 14.2.2 -> 14.2.5 upgrade,
so folks here are a bit cautious :-)  We don't need to convert but may have
to rebuild few disks after an upgrade.


On Mon, Dec 16, 2019 at 3:57 PM Bryan Stillwell <bstillw...@godaddy.com>
wrote:

> Sasha,
>
> I was able to get past it by restarting the ceph-mon processes every time
> it got stuck, but that's not a very good solution for a production cluster.
>
> Right now I'm trying to narrow down what is causing the problem.
> Rebuilding the OSDs with BlueStore doesn't seem to be enough.  I believe it
> could be related to us using the extra space on the journal device as an
> SSD-based OSD.  During the conversion process I'm removing this SSD-based
> OSD (since with BlueStore the omap data is ending up on the SSD anyways),
> and I'm suspecting it might be causing this problem.
>
> Bryan
>
> On Dec 14, 2019, at 10:27 AM, Sasha Litvak <alexander.v.lit...@gmail.com>
> wrote:
>
> Notice: This email is from an external sender.
>
> Bryan,
>
> Were you able to resolve this?  If yes, can you please share with the list?
>
> On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell <bstillw...@godaddy.com>
> wrote:
>
>> Adding the dev list since it seems like a bug in 14.2.5.
>>
>> I was able to capture the output from perf top:
>>
>>   21.58%  libceph-common.so.0               [.]
>> ceph::buffer::v14_2_0::list::append
>>   20.90%  libstdc++.so.6.0.19               [.] std::getline<char,
>> std::char_traits<char>, std::allocator<char> >
>>   13.25%  libceph-common.so.0               [.]
>> ceph::buffer::v14_2_0::list::append
>>   10.11%  libstdc++.so.6.0.19               [.]
>> std::istream::sentry::sentry
>>    8.94%  libstdc++.so.6.0.19               [.] std::basic_ios<char,
>> std::char_traits<char> >::clear
>>    3.24%  libceph-common.so.0               [.]
>> ceph::buffer::v14_2_0::ptr::unused_tail_length
>>    1.69%  libceph-common.so.0               [.] std::getline<char,
>> std::char_traits<char>, std::allocator<char> >@plt
>>    1.63%  libstdc++.so.6.0.19               [.]
>> std::istream::sentry::sentry@plt
>>    1.21%  [kernel]                          [k] __do_softirq
>>    0.77%  libpython2.7.so.1.0               [.] PyEval_EvalFrameEx
>>    0.55%  [kernel]                          [k]
>> _raw_spin_unlock_irqrestore
>>
>> I increased mon debugging to 20 and nothing stuck out to me.
>>
>> Bryan
>>
>> > On Dec 12, 2019, at 4:46 PM, Bryan Stillwell <bstillw...@godaddy.com>
>> wrote:
>> >
>> > On our test cluster after upgrading to 14.2.5 I'm having problems with
>> the mons pegging a CPU core while moving data around.  I'm currently
>> converting the OSDs from FileStore to BlueStore by marking the OSDs out in
>> multiple nodes, destroying the OSDs, and then recreating them with
>> ceph-volume lvm batch.  This seems too get the ceph-mon process into a
>> state where it pegs a CPU core on one of the mons:
>> >
>> > 1764450 ceph      20   0 4802412   2.1g  16980 S 100.0 28.1   4:54.72
>> ceph-mon
>> >
>> > Has anyone else run into this with 14.2.5 yet?  I didn't see this
>> problem while the cluster was running 14.2.4.
>> >
>> > Thanks,
>> > Bryan
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph-mon using 100% CPU after upgrade to 14.2.5

Reply via email to