UPDATE. I have determined mon sync heartbeat timeout to be triggering
since increasing it also increases the duration of the sync attempts.
Could those heartbeats be quorum-related? Thatd explain why they aren't
being sent. Also is it safe to temporarily increase this timeout to say
an hour or two
Hi,
I had already figured that out later, thanks though. So back to .61.2 it
was. I was then trying to see whether debug logging would tell me why
the mons wont rejoin the cluster. Their logs look like this:
(Interesting part at the bottom... I think)
2014-03-02 14:25:34.960372 7f7c13a6e700 10
Hi,
You can't form quorom with your monitors on cuttlefish if you're mixing <
0.61.5 with any 0.61.5+ ( https://ceph.com/docs/master/release-notes/ ) =>
section about 0.61.5.
I'll advice installing pre-0.61.5, form quorom and then upgrade to 0.61.9
(if needs be) - and then latest dumpling on top.
Hi,
thanks for the reply. I updated one of the new mons. And after a
resonably long init phase (inconsistent state), I am now seeing these:
2014-02-28 01:05:12.344648 7fe9d05cb700 0 cephx: verify_reply coudln't
decrypt with error: error decoding block for decryption
2014-02-28 01:05:12.345599 7f
On Thu, Feb 27, 2014 at 4:25 PM, Marc wrote:
> Hi,
>
> I was handed a Ceph cluster that had just lost quorum due to 2/3 mons
> (b,c) running out of disk space (using up 15GB each). We were trying to
> rescue this cluster without service downtime. As such we freed up some
> space to keep mon b runn