Just to further piggyback,

Probably the most "hard" the mgr seems to get pushed is when the balancer is 
engaged.
When trying to eval a pool or cluster, it takes upwards of 30-120 seconds for 
it to score it, and then another 30-120 seconds to execute the plan, and it 
never seems to engage automatically.

> $ time ceph balancer status
> {
>     "active": true,
>     "plans": [],
>     "mode": "upmap"
> }
> 
> real    0m36.490s
> user    0m0.259s
> sys     0m0.044s


I'm going to disable mine as well, and see if I can stop waking up to 'No 
Active MGR.'


You can see when I lose mgr's because RBD image stats go to 0 until I catch it.

Thanks,

Reed

> On Aug 27, 2019, at 11:24 AM, Jake Grimmett <j...@mrc-lmb.cam.ac.uk> wrote:
> 
> Hi Reed, Lenz, John
> 
> I've just tried disabling the balancer, so far ceph-mgr is keeping it's
> CPU mostly under 20%, even with both the iostat and dashboard back on.
> 
> # ceph balancer off
> 
> was
> [root@ceph-s1 backup]# ceph balancer status
> {
>    "active": true,
>    "plans": [],
>    "mode": "upmap"
> }
> 
> now
> [root@ceph-s1 backup]# ceph balancer status
> {
>    "active": false,
>    "plans": [],
>    "mode": "upmap"
> }
> 
> We are using 8:2 erasure encoding across 324 12TB OSD, plus 4 NVMe OSD
> for a replicated cephfs metadata pool.
> 
> let me know if the balancer is your problem too...
> 
> best,
> 
> Jake
> 
> On 8/27/19 3:57 PM, Jake Grimmett wrote:
>> Yes, the problem still occurs with the dashboard disabled...
>> 
>> Possibly relevant, when both the dashboard and iostat plugins are
>> disabled, I occasionally see ceph-mgr rise to 100% CPU.
>> 
>> as suggested by John Hearns, the output of  gstack ceph-mgr when at 100%
>> is here:
>> 
>> http://p.ip.fi/52sV
>> 
>> many thanks
>> 
>> Jake
>> 
>> On 8/27/19 3:09 PM, Reed Dier wrote:
>>> I'm currently seeing this with the dashboard disabled.
>>> 
>>> My instability decreases, but isn't wholly cured, by disabling
>>> prometheus and rbd_support, which I use in tandem, as the only thing I'm
>>> using the prom-exporter for is the per-rbd metrics.
>>> 
>>>> ceph mgr module ls
>>>> {
>>>>     "enabled_modules": [
>>>>         "diskprediction_local",
>>>>         "influx",
>>>>         "iostat",
>>>>         "prometheus",
>>>>         "rbd_support",
>>>>         "restful",
>>>>         "telemetry"
>>>>     ],
>>> 
>>> I'm on Ubuntu 18.04, so that doesn't corroborate with some possible OS
>>> correlation.
>>> 
>>> Thanks,
>>> 
>>> Reed
>>> 
>>>> On Aug 27, 2019, at 8:37 AM, Lenz Grimmer <lgrim...@suse.com
>>>> <mailto:lgrim...@suse.com>> wrote:
>>>> 
>>>> Hi Jake,
>>>> 
>>>> On 8/27/19 3:22 PM, Jake Grimmett wrote:
>>>> 
>>>>> That exactly matches what I'm seeing:
>>>>> 
>>>>> when iostat is working OK, I see ~5% CPU use by ceph-mgr
>>>>> and when iostat freezes, ceph-mgr CPU increases to 100%
>>>> 
>>>> Does this also occur if the dashboard module is disabled? Just wondering
>>>> if this is isolatable to the iostat module. Thanks!
>>>> 
>>>> Lenz
>>>> 
>>>> -- 
>>>> SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
>>>> GF: Felix Imendörffer, HRB 247165 (AG Nürnberg)
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>> 
>> 
> 
> 
> -- 
> MRC Laboratory of Molecular Biology
> Francis Crick Avenue,
> Cambridge CB2 0QH, UK.
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to