Re: [ceph-users] iostat and dashboard freezing

2019-09-12 Thread Konstantin Shalygin
On 9/13/19 4:51 AM, Reed Dier wrote: I would love to deprecate the multi-root, and may try to do just that in my next OSD add, just worried about data shuffling unnecessarily. Would this in theory help my distribution across disparate OSD topologies? May be. Actually I don't know where is bala

Re: [ceph-users] iostat and dashboard freezing

2019-09-12 Thread Reed Dier
> 1. Multi-root. You should deprecate your 'ssd' root and move your osds of > this root to 'default' root. > I would love to deprecate the multi-root, and may try to do just that in my next OSD add, just worried about data shuffling unnecessarily. Would this in theory help my distribution across

Re: [ceph-users] iostat and dashboard freezing

2019-09-09 Thread Konstantin Shalygin
On 9/2/19 5:47 PM, Jake Grimmett wrote: Hi Konstantin, To confirm, disabling the balancer allows the mgr to work properly. I tried re-enabling the balancer, it briefly worked, then locked up the mgr again. Here it's working OK... [root@ceph-s1 ~]# time ceph balancer optimize new real0m1.6

Re: [ceph-users] iostat and dashboard freezing

2019-09-09 Thread Konstantin Shalygin
On 8/29/19 9:56 PM, Reed Dier wrote: "config/mgr/mgr/balancer/active", "config/mgr/mgr/balancer/max_misplaced", "config/mgr/mgr/balancer/mode", "config/mgr/mgr/balancer/pool_ids", This is useless keys, you may to remove it. https://pastebin.com/bXPs28h1 Issues that you have: 1. Multi-root.

Re: [ceph-users] iostat and dashboard freezing

2019-08-29 Thread Reed Dier
See responses below. > On Aug 28, 2019, at 11:13 PM, Konstantin Shalygin wrote: >> Just a follow up 24h later, and the mgr's seem to be far more stable, and >> have had no issues or weirdness after disabling the balancer module. >> >> Which isn't great, because the balancer plays an important r

Re: [ceph-users] iostat and dashboard freezing

2019-08-28 Thread Konstantin Shalygin
Just a follow up 24h later, and the mgr's seem to be far more stable, and have had no issues or weirdness after disabling the balancer module. Which isn't great, because the balancer plays an important role, but after fighting distribution for a few weeks and getting it 'good enough' I'm taking

Re: [ceph-users] iostat and dashboard freezing

2019-08-28 Thread Reed Dier
Just a follow up 24h later, and the mgr's seem to be far more stable, and have had no issues or weirdness after disabling the balancer module. Which isn't great, because the balancer plays an important role, but after fighting distribution for a few weeks and getting it 'good enough' I'm taking

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
Just to further piggyback, Probably the most "hard" the mgr seems to get pushed is when the balancer is engaged. When trying to eval a pool or cluster, it takes upwards of 30-120 seconds for it to score it, and then another 30-120 seconds to execute the plan, and it never seems to engage automa

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Jake Grimmett
Hi Reed, Lenz, John I've just tried disabling the balancer, so far ceph-mgr is keeping it's CPU mostly under 20%, even with both the iostat and dashboard back on. # ceph balancer off was [root@ceph-s1 backup]# ceph balancer status { "active": true, "plans": [], "mode": "upmap" } now

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Jake Grimmett
Yes, the problem still occurs with the dashboard disabled... Possibly relevant, when both the dashboard and iostat plugins are disabled, I occasionally see ceph-mgr rise to 100% CPU. as suggested by John Hearns, the output of gstack ceph-mgr when at 100% is here: http://p.ip.fi/52sV many thank

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
I'm currently seeing this with the dashboard disabled. My instability decreases, but isn't wholly cured, by disabling prometheus and rbd_support, which I use in tandem, as the only thing I'm using the prom-exporter for is the per-rbd metrics. > ceph mgr module ls > { > "enabled_modules": [

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread John Hearns
Try running gstack on the ceph mgr process when it is frozen? This could be a name resolution problem, as you suspect. Maybe gstack will show where the process is 'stuck'and this might be a call to your name resolution service. On Tue, 27 Aug 2019 at 14:25, Jake Grimmett wrote: > Whoops, I'm r

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Lenz Grimmer
Hi Jake, On 8/27/19 3:22 PM, Jake Grimmett wrote: > That exactly matches what I'm seeing: > > when iostat is working OK, I see ~5% CPU use by ceph-mgr > and when iostat freezes, ceph-mgr CPU increases to 100% Does this also occur if the dashboard module is disabled? Just wondering if this is is

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Jake Grimmett
Whoops, I'm running Scientific Linux 7.6, going to upgrade to 7.7. soon... thanks Jake On 8/27/19 2:22 PM, Jake Grimmett wrote: > Hi Reed, > > That exactly matches what I'm seeing: > > when iostat is working OK, I see ~5% CPU use by ceph-mgr > and when iostat freezes, ceph-mgr CPU increases t

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Jake Grimmett
Hi Reed, That exactly matches what I'm seeing: when iostat is working OK, I see ~5% CPU use by ceph-mgr and when iostat freezes, ceph-mgr CPU increases to 100% regarding OS, I'm using Scientific Linux 7.7 Kernel 3.10.0-957.21.3.el7.x86_64 I'm not sure if the mgr initiates scrubbing, but if so,

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
Curious what dist you're running on, as I've been having similar issues with instability in the mgr as well, curious if any similar threads to pull at. While the iostat command is running, is the active mgr using 100% CPU in top? Reed > On Aug 27, 2019, at 6:41 AM, Jake Grimmett wrote: > > De

[ceph-users] iostat and dashboard freezing

2019-08-27 Thread Jake Grimmett
Dear All, We have a new Nautilus (14.2.2) cluster, with 328 OSDs spread over 40 nodes. Unfortunately "ceph iostat" spends most of it's time frozen, with occasional periods of working normally for less than a minute, then freeze again for a couple of minutes, then come back to life, and so so on..