Just a follow up 24h later, and the mgr's seem to be far more stable, and have 
had no issues or weirdness after disabling the balancer module.

Which isn't great, because the balancer plays an important role, but after 
fighting distribution for a few weeks and getting it 'good enough' I'm taking 
the stability.

Just wanted to follow up with another 2¢.

Reed

> On Aug 27, 2019, at 11:53 AM, Reed Dier <reed.d...@focusvq.com> wrote:
> 
> Just to further piggyback,
> 
> Probably the most "hard" the mgr seems to get pushed is when the balancer is 
> engaged.
> When trying to eval a pool or cluster, it takes upwards of 30-120 seconds for 
> it to score it, and then another 30-120 seconds to execute the plan, and it 
> never seems to engage automatically.
> 
>> $ time ceph balancer status
>> {
>>     "active": true,
>>     "plans": [],
>>     "mode": "upmap"
>> }
>> 
>> real    0m36.490s
>> user    0m0.259s
>> sys     0m0.044s
> 
> 
> I'm going to disable mine as well, and see if I can stop waking up to 'No 
> Active MGR.'
> <PastedGraphic-2.png>
> 
> You can see when I lose mgr's because RBD image stats go to 0 until I catch 
> it.
> 
> Thanks,
> 
> Reed
> 
>> On Aug 27, 2019, at 11:24 AM, Jake Grimmett <j...@mrc-lmb.cam.ac.uk 
>> <mailto:j...@mrc-lmb.cam.ac.uk>> wrote:
>> 
>> Hi Reed, Lenz, John
>> 
>> I've just tried disabling the balancer, so far ceph-mgr is keeping it's
>> CPU mostly under 20%, even with both the iostat and dashboard back on.
>> 
>> # ceph balancer off
>> 
>> was
>> [root@ceph-s1 backup]# ceph balancer status
>> {
>>    "active": true,
>>    "plans": [],
>>    "mode": "upmap"
>> }
>> 
>> now
>> [root@ceph-s1 backup]# ceph balancer status
>> {
>>    "active": false,
>>    "plans": [],
>>    "mode": "upmap"
>> }
>> 
>> We are using 8:2 erasure encoding across 324 12TB OSD, plus 4 NVMe OSD
>> for a replicated cephfs metadata pool.
>> 
>> let me know if the balancer is your problem too...
>> 
>> best,
>> 
>> Jake
>> 
>> On 8/27/19 3:57 PM, Jake Grimmett wrote:
>>> Yes, the problem still occurs with the dashboard disabled...
>>> 
>>> Possibly relevant, when both the dashboard and iostat plugins are
>>> disabled, I occasionally see ceph-mgr rise to 100% CPU.
>>> 
>>> as suggested by John Hearns, the output of  gstack ceph-mgr when at 100%
>>> is here:
>>> 
>>> http://p.ip.fi/52sV <http://p.ip.fi/52sV>
>>> 
>>> many thanks
>>> 
>>> Jake
>>> 
>>> On 8/27/19 3:09 PM, Reed Dier wrote:
>>>> I'm currently seeing this with the dashboard disabled.
>>>> 
>>>> My instability decreases, but isn't wholly cured, by disabling
>>>> prometheus and rbd_support, which I use in tandem, as the only thing I'm
>>>> using the prom-exporter for is the per-rbd metrics.
>>>> 
>>>>> ceph mgr module ls
>>>>> {
>>>>>     "enabled_modules": [
>>>>>         "diskprediction_local",
>>>>>         "influx",
>>>>>         "iostat",
>>>>>         "prometheus",
>>>>>         "rbd_support",
>>>>>         "restful",
>>>>>         "telemetry"
>>>>>     ],
>>>> 
>>>> I'm on Ubuntu 18.04, so that doesn't corroborate with some possible OS
>>>> correlation.
>>>> 
>>>> Thanks,
>>>> 
>>>> Reed
>>>> 
>>>>> On Aug 27, 2019, at 8:37 AM, Lenz Grimmer <lgrim...@suse.com
>>>>> <mailto:lgrim...@suse.com>> wrote:
>>>>> 
>>>>> Hi Jake,
>>>>> 
>>>>> On 8/27/19 3:22 PM, Jake Grimmett wrote:
>>>>> 
>>>>>> That exactly matches what I'm seeing:
>>>>>> 
>>>>>> when iostat is working OK, I see ~5% CPU use by ceph-mgr
>>>>>> and when iostat freezes, ceph-mgr CPU increases to 100%
>>>>> 
>>>>> Does this also occur if the dashboard module is disabled? Just wondering
>>>>> if this is isolatable to the iostat module. Thanks!
>>>>> 
>>>>> Lenz
>>>>> 
>>>>> -- 
>>>>> SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
>>>>> GF: Felix Imendörffer, HRB 247165 (AG Nürnberg)
>>>>> 
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> MRC Laboratory of Molecular Biology
>> Francis Crick Avenue,
>> Cambridge CB2 0QH, UK.
>> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to