[ceph-users] Re: MON slow ops and growing MON store

2022-01-10 Thread Daniel Poelzleithner
Hi, > Like last time, after I restarted all five MONs, the store size > decreased and everything went back to normal. I also had to restart MGRs > and MDSs afterwards. This starts looking like a bug to me. In our case, we had a real database corruption in the rocksdb that caused version counter

[ceph-users] Re: MON slow ops and growing MON store

2021-03-18 Thread Janek Bevendorff
We just had the same problem again after a power outage that took out 62% of our cluster and three out of five MONs. Once everything was back up, the MONs started lagging and piling up slow ops while to MON store was growing to double-digit gigabytes. It was so bad that I couldn't even list the

[ceph-users] Re: MON slow ops and growing MON store

2021-02-26 Thread Janek Bevendorff
Since the full cluster restart and disabling logging to syslog, it's not a problem any more (for now). Unfortunately, just disabling clog_to_monitors didn't have the wanted effect when I tried it yesterday. But I also believe that it is somehow related. I could not find any specific reason for

[ceph-users] Re: MON slow ops and growing MON store

2021-02-26 Thread Mykola Golub
On Thu, Feb 25, 2021 at 08:58:01PM +0100, Janek Bevendorff wrote: > On the first MON, the command doesn’t even return, but I was able to > get a dump from the one I restarted most recently. The oldest ops > look like this: > > { > "description": "log(1000 entries from seq 17876

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
> On 25. Feb 2021, at 22:17, Dan van der Ster wrote: > > Also did you solve your log spam issue here? > https://tracker.ceph.com/issues/49161 > Surely these things are related? No. But I noticed that DBG log spam only happens when log_to_syslog is enabled. systemd is smart enough to avoid fi

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
Also did you solve your log spam issue here? https://tracker.ceph.com/issues/49161 Surely these things are related? You might need to share more full logs from cluster, mon, osd, mds, mgr so that we can help get to the bottom of this. -- dan On Thu, Feb 25, 2021 at 10:04 PM Janek Bevendorff wro

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Thanks, I’ll try that tomorrow. > On 25. Feb 2021, at 21:59, Dan van der Ster wrote: > > Maybe the debugging steps in that insights tracker can be helpful > anyway: https://tracker.ceph.com/issues/39955 > > -- dan > > On Thu, Feb 25, 2021 at 9:27 PM Janek Bevendorff > wrote: >> >> Thanks fo

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
Maybe the debugging steps in that insights tracker can be helpful anyway: https://tracker.ceph.com/issues/39955 -- dan On Thu, Feb 25, 2021 at 9:27 PM Janek Bevendorff wrote: > > Thanks for the tip, but I do not have degraded PGs and the module is already > disabled. > > > On 25. Feb 2021, at 2

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Thanks for the tip, but I do not have degraded PGs and the module is already disabled. > On 25. Feb 2021, at 21:17, Seena Fallah wrote: > > I had the same problem in my cluster and it was because of insights mgr > module that was storing lots of data to the RocksDB because mu cluster was > d

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Seena Fallah
I had the same problem in my cluster and it was because of insights mgr module that was storing lots of data to the RocksDB because mu cluster was degraded. If you have degraded pgs try to disable insights module. On Thu, Feb 25, 2021 at 11:40 PM Dan van der Ster wrote: > > "source": "osd.104...

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Nothing special is going on that OSD as far as I can tell and the OSD number of each op is different. The config isn’t entirely default, but we have been using it successfully for quite a bit. It basically just redirects everything to journald so that we don’t have log creep. I reverted it nonet

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
> "source": "osd.104... What's happening on that osd? Is it something new which corresponds to when your mon started growing? Are other OSDs also flooding the mons with logs? I'm mobile so can't check... Are those logging configs the defaults? If not revert to default... BTW do your mons ha

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Thanks, Dan. On the first MON, the command doesn’t even return, but I was able to get a dump from the one I restarted most recently. The oldest ops look like this: { "description": "log(1000 entries from seq 17876238 at 2021-02-25T15:13:20.306487+0100)", "initiat

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
ceph daemon mon.`hostname -s` ops That should show you the accumulating ops. .. dan On Thu, Feb 25, 2021, 8:23 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Hi, > > All of a sudden, we are experiencing very concerning MON behaviour. We > have five MONs and all of them have tho