[ceph-users] Re: MGRs failing once per day and generally slow response times

Janek Bevendorff Mon, 16 Mar 2020 01:36:27 -0700

Over the weekend, all five MGRs failed, which means we have no morePrometheus monitoring data. We are obviously monitoring the MGR statusas well, so we can detect the failure, but it's still a pretty seriousissue. Any ideas as to why this might happen?


On 13/03/2020 16:56, Janek Bevendorff wrote:

Indeed. I just had another MGR go bye-bye. I don't think host clockskew is the problem.
On 13/03/2020 15:29, Anthony D'Atri wrote:
Chrony does converge faster, but I doubt this will solve your problemif you don’t have quality peers. Or if it’s not really a time problem.
On Mar 13, 2020, at 6:44 AM, Janek Bevendorff<janek.bevendo...@uni-weimar.de> wrote:
I replaced ntpd with chronyd and will let you know if it changesanything. Thanks.
On 13/03/2020 06:25, Konstantin Shalygin wrote:
On 3/13/20 12:57 AM, Janek Bevendorff wrote:
NTPd is running, all the nodes have the same time to the second. Idon't think that is the problem.
As always in such cases - try to switch your ntpd to default EL7daemon - chronyd.
k
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
Bauhaus-Universität Weimar
Bauhausstr. 9a, Room 308
99423 Weimar, Germany

Phone: +49 (0)3643 - 58 3577
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MGRs failing once per day and generally slow response times

Reply via email to