Just to further piggyback, Probably the most "hard" the mgr seems to get pushed is when the balancer is engaged. When trying to eval a pool or cluster, it takes upwards of 30-120 seconds for it to score it, and then another 30-120 seconds to execute the plan, and it never seems to engage automatically.
> $ time ceph balancer status > { > "active": true, > "plans": [], > "mode": "upmap" > } > > real 0m36.490s > user 0m0.259s > sys 0m0.044s I'm going to disable mine as well, and see if I can stop waking up to 'No Active MGR.' You can see when I lose mgr's because RBD image stats go to 0 until I catch it. Thanks, Reed > On Aug 27, 2019, at 11:24 AM, Jake Grimmett <j...@mrc-lmb.cam.ac.uk> wrote: > > Hi Reed, Lenz, John > > I've just tried disabling the balancer, so far ceph-mgr is keeping it's > CPU mostly under 20%, even with both the iostat and dashboard back on. > > # ceph balancer off > > was > [root@ceph-s1 backup]# ceph balancer status > { > "active": true, > "plans": [], > "mode": "upmap" > } > > now > [root@ceph-s1 backup]# ceph balancer status > { > "active": false, > "plans": [], > "mode": "upmap" > } > > We are using 8:2 erasure encoding across 324 12TB OSD, plus 4 NVMe OSD > for a replicated cephfs metadata pool. > > let me know if the balancer is your problem too... > > best, > > Jake > > On 8/27/19 3:57 PM, Jake Grimmett wrote: >> Yes, the problem still occurs with the dashboard disabled... >> >> Possibly relevant, when both the dashboard and iostat plugins are >> disabled, I occasionally see ceph-mgr rise to 100% CPU. >> >> as suggested by John Hearns, the output of gstack ceph-mgr when at 100% >> is here: >> >> http://p.ip.fi/52sV >> >> many thanks >> >> Jake >> >> On 8/27/19 3:09 PM, Reed Dier wrote: >>> I'm currently seeing this with the dashboard disabled. >>> >>> My instability decreases, but isn't wholly cured, by disabling >>> prometheus and rbd_support, which I use in tandem, as the only thing I'm >>> using the prom-exporter for is the per-rbd metrics. >>> >>>> ceph mgr module ls >>>> { >>>> "enabled_modules": [ >>>> "diskprediction_local", >>>> "influx", >>>> "iostat", >>>> "prometheus", >>>> "rbd_support", >>>> "restful", >>>> "telemetry" >>>> ], >>> >>> I'm on Ubuntu 18.04, so that doesn't corroborate with some possible OS >>> correlation. >>> >>> Thanks, >>> >>> Reed >>> >>>> On Aug 27, 2019, at 8:37 AM, Lenz Grimmer <lgrim...@suse.com >>>> <mailto:lgrim...@suse.com>> wrote: >>>> >>>> Hi Jake, >>>> >>>> On 8/27/19 3:22 PM, Jake Grimmett wrote: >>>> >>>>> That exactly matches what I'm seeing: >>>>> >>>>> when iostat is working OK, I see ~5% CPU use by ceph-mgr >>>>> and when iostat freezes, ceph-mgr CPU increases to 100% >>>> >>>> Does this also occur if the dashboard module is disabled? Just wondering >>>> if this is isolatable to the iostat module. Thanks! >>>> >>>> Lenz >>>> >>>> -- >>>> SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg >>>> GF: Felix Imendörffer, HRB 247165 (AG Nürnberg) >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> > > > -- > MRC Laboratory of Molecular Biology > Francis Crick Avenue, > Cambridge CB2 0QH, UK. >
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com