subject:"\[ceph\-users\] Re\: mgr daemons becoming unresponsive"

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Gregory Farnum

On Wed, Nov 6, 2019 at 1:29 PM Sage Weil wrote: > > My current working theory is that the mgr is getting hung up when it tries > to scrape the device metrics from the mon. The 'tell' mechanism used to > send mon-targetted commands is pretty kludgey/broken in nautilus and > earlier. It's been rew

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Oliver Freyermuth

Dear Sage, Am 07.11.19 um 14:33 schrieb Sage Weil: On Thu, 7 Nov 2019, Thomas Schneider wrote: Hi, I have installed package ceph-mgr_14.2.4-1-gd592e56-1bionic_amd64.deb manually: root@ld5505:/home# dpkg --force-depends -i ceph-mgr_14.2.4-1-gd592e56-1bionic_amd64.deb (Reading database ... 10746

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Thomas Schneider

Hi, looks like I sent my previous email too soon. The error 2019-11-07 15:53:06.077 7f7ea8afe700 0 auth: could not find secret_id=3887 2019-11-07 15:53:06.077 7f7ea8afe700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=3887 is back in MGR log. ;-( Am 07.11.

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Thomas Schneider

Hi, I have installed all ceph packages from Sage's repo, means ceph ceph-common ceph-mds ceph-mgr-dashboard ceph-mon ceph-osd libcephfs2 librados2 libradosstriper1 librbd1 librgw2 python-ceph-argparse python-cephfs python-rados python-rbd python-rgw after adding his repo and executing apt upgrade

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Sage Weil

On Thu, 7 Nov 2019, Thomas Schneider wrote: > Hi, > > I have installed package > ceph-mgr_14.2.4-1-gd592e56-1bionic_amd64.deb > manually: > root@ld5505:/home# dpkg --force-depends -i > ceph-mgr_14.2.4-1-gd592e56-1bionic_amd64.deb > (Reading database ... 107461 files and directories currently insta

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Thomas Schneider

Hi, I have installed package ceph-mgr_14.2.4-1-gd592e56-1bionic_amd64.deb manually: root@ld5505:/home# dpkg --force-depends -i ceph-mgr_14.2.4-1-gd592e56-1bionic_amd64.deb (Reading database ... 107461 files and directories currently installed.) Preparing to unpack ceph-mgr_14.2.4-1-gd592e56-1bioni

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Oliver Freyermuth

Dear Thomas, the most correct thing to do is probably to add the full repo (the original link was still empty for me, but https://shaman.ceph.com/repos/ceph/wip-no-scrape-mons-nautilus/ seems to work). The commit itself suggests the ceph-mgr package should be sufficient. I'm still pondering tho

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread Thomas Schneider

Hi, can you please advise which package(s) should be installed? Thanks Am 06.11.2019 um 22:28 schrieb Sage Weil: > My current working theory is that the mgr is getting hung up when it tries > to scrape the device metrics from the mon. The 'tell' mechanism used to > send mon-targetted comma

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread Sage Weil

My current working theory is that the mgr is getting hung up when it tries to scrape the device metrics from the mon. The 'tell' mechanism used to send mon-targetted commands is pretty kludgey/broken in nautilus and earlier. It's been rewritten for octopus, but isn't worth backporting--it ne

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread Thomas Schneider

Well, even after restarting the MGR service the relevant log is spoiled with this error messages: 2019-11-06 17:46:22.363 7f81ffdcc700 0 auth: could not find secret_id=3865 2019-11-06 17:46:22.363 7f81ffdcc700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=3865

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread Thomas Schneider

Hi, does anybody get this error messages in MGR log? 2019-11-06 15:41:44.765 7f10db740700 0 auth: could not find secret_id=3863 2019-11-06 15:41:44.765 7f10db740700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=3863 THX Am 06.11.2019 um 10:43 schrieb Oliver

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread thoralf schulze

hi oliver, On 11/6/19 10:43 AM, Oliver Freyermuth wrote: […] > Did somebody see something similar after running for a week or more with > Nautilus on old and slow hardware? yes, same here: significantly more mgr failovers / compaction jobs with nautilus than with mimic … most likely due to pgs be

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread Oliver Freyermuth

Hi together, interestingly, now that the third mon is missing for almost a week (those planned interventions always take longer than expected...), we get mgr failovers (but without crashes). In the mgr log, I find: 2019-11-06 07:50:05.409 7fce8a0dc700 0 client.0 ms_handle_reset on v2:10.160.

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-04 Thread Janek Bevendorff

On 02.11.19 18:35, Oliver Freyermuth wrote: Dear Janek, in my case, the mgr daemon itself remains "running", it just stops reporting to the mon. It even still serves the dashboard, but with outdated information. This is not so different. The MGRs in my case are running, but stop responding.

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Thomas

Hi, on the error log of my active MGR I find these errors after some time: 2019-11-02 19:07:30.629 7f448f1cb700 0 auth: could not find secret_id=3769 2019-11-02 19:07:30.629 7f448f1cb700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=3769 2019-11-02 19:07:30

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Thomas

Hi, I experience major issues with MGR and by chance my drives are on non-JBOD controllers, too (like Oliver's drives). Regards Thomas Am 02.11.2019 um 17:38 schrieb Oliver Freyermuth: Dear Sage, at least for the simple case: ceph device get-health-metrics osd.11 => mgr crashes (but in th

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Oliver Freyermuth

Dear Janek, in my case, the mgr daemon itself remains "running", it just stops reporting to the mon. It even still serves the dashboard, but with outdated information. I grepped through the logs and could not find any clock skew messages. So it seems to be a different issue (albeit both issues

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Janek Bevendorff

These issues sound a bit like a bug I reported a few days ago: https://tracker.ceph.com/issues/39264 Related: https://tracker.ceph.com/issues/39264 On 02/11/2019 17:34, Oliver Freyermuth wr

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Oliver Freyermuth

Dear Sage, good news - it happened again, with debug logs! There's nothing obvious to my eye, it's uploaded as: 0b2d0c09-46f3-4126-aa27-e2d2e8572741 It seems the failure was roughly in parallel to me wanting to access the dashboard. It must have happened within the last ~5-10 minutes of the log.

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Oliver Freyermuth

Dear Sage, at least for the simple case: ceph device get-health-metrics osd.11 => mgr crashes (but in that case, it crashes fully, i.e. the process is gone) I have now uploaded a verbose log as: ceph-post-file: e3bd60ad-cbce-4308-8b07-7ebe7998572e One potential cause of this (and maybe the other

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Oliver Freyermuth

Dear Reed, yes, also the balancer is on for me - but the instabilities vanished as soon as I turned off device health metrics. Cheers, Oliver Am 02.11.19 um 17:31 schrieb Reed Dier: > Do you also have the balancer module on? > > I experienced extremely bad stability issues where the MGRs woul

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Reed Dier

Do you also have the balancer module on? I experienced extremely bad stability issues where the MGRs would silently die with the balancer module on. And by on, I mean 'active:true` by way of `ceph balancer on`. Once I disabled the automatic balancer, it seemed to become much more stable. I can

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Oliver Freyermuth

Hi Thomas, indeed, I also had the dashboard open at these times - but right now, after disabling device health metrics, I can not retrigger it even when playing wildly on the dashboard. So I'll now reenable health metrics and try to retrigger the issue with cranked up debug levels as Sage sugg

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-02 Thread Thomas

Hi Oliver, I experienced a situation where MGRs "goes crazy", means MGR was active but not working. In the logs of the standby MGR nodes I found an error (after restarting service) that pointed to Ceph Dashboard. Since disabling the dashboard my MGRs are stable again. Regards Thomas Am 02.1

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-01 Thread Sage Weil

On Sat, 2 Nov 2019, Oliver Freyermuth wrote: > Dear Cephers, > > interestingly, after: > ceph device monitoring off > the mgrs seem to be stable now - the active one still went silent a few > minutes later, > but the standby took over and was stable, and restarting the broken one, it's > now st

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-01 Thread Oliver Freyermuth

Dear Cephers, interestingly, after: ceph device monitoring off the mgrs seem to be stable now - the active one still went silent a few minutes later, but the standby took over and was stable, and restarting the broken one, it's now stable since an hour, too, so probably, a restart of the mgr is

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

[ceph-users] Re: mgr daemons becoming unresponsive

26 matches

Site Navigation

Mail list logo

Footer information