UPDATE. I have determined mon sync heartbeat timeout to be triggering
since increasing it also increases the duration of the sync attempts.
Could those heartbeats be quorum-related? Thatd explain why they aren't
being sent. Also is it safe to temporarily increase this timeout to say
an hour or two
Hi,
I had already figured that out later, thanks though. So back to .61.2 it
was. I was then trying to see whether debug logging would tell me why
the mons wont rejoin the cluster. Their logs look like this:
(Interesting part at the bottom... I think)
2014-03-02 14:25:34.960372 7f7c13a6e700 10
Hi,
You can't form quorom with your monitors on cuttlefish if you're mixing <
0.61.5 with any 0.61.5+ ( https://ceph.com/docs/master/release-notes/ ) =>
section about 0.61.5.
I'll advice installing pre-0.61.5, form quorom and then upgrade to 0.61.9
(if needs be) - and then latest dumpling on top.
Hi,
thanks for the reply. I updated one of the new mons. And after a
resonably long init phase (inconsistent state), I am now seeing these:
2014-02-28 01:05:12.344648 7fe9d05cb700 0 cephx: verify_reply coudln't
decrypt with error: error decoding block for decryption
2014-02-28 01:05:12.345599 7f
On Thu, Feb 27, 2014 at 4:25 PM, Marc wrote:
> Hi,
>
> I was handed a Ceph cluster that had just lost quorum due to 2/3 mons
> (b,c) running out of disk space (using up 15GB each). We were trying to
> rescue this cluster without service downtime. As such we freed up some
> space to keep mon b runn
Hi,
I was handed a Ceph cluster that had just lost quorum due to 2/3 mons
(b,c) running out of disk space (using up 15GB each). We were trying to
rescue this cluster without service downtime. As such we freed up some
space to keep mon b running a while longer, which succeeded, quorum
restored (a,b