Re: [ceph-users] Trying to rescue a lost quorum

2014-03-04 Thread Marc
UPDATE. I have determined mon sync heartbeat timeout to be triggering since increasing it also increases the duration of the sync attempts. Could those heartbeats be quorum-related? Thatd explain why they aren't being sent. Also is it safe to temporarily increase this timeout to say an hour or two

Re: [ceph-users] Trying to rescue a lost quorum

2014-03-02 Thread Marc
Hi, I had already figured that out later, thanks though. So back to .61.2 it was. I was then trying to see whether debug logging would tell me why the mons wont rejoin the cluster. Their logs look like this: (Interesting part at the bottom... I think) 2014-03-02 14:25:34.960372 7f7c13a6e700 10

Re: [ceph-users] Trying to rescue a lost quorum

2014-03-01 Thread Martin B Nielsen
Hi, You can't form quorom with your monitors on cuttlefish if you're mixing < 0.61.5 with any 0.61.5+ ( https://ceph.com/docs/master/release-notes/ ) => section about 0.61.5. I'll advice installing pre-0.61.5, form quorom and then upgrade to 0.61.9 (if needs be) - and then latest dumpling on top.

Re: [ceph-users] Trying to rescue a lost quorum

2014-02-27 Thread Marc
Hi, thanks for the reply. I updated one of the new mons. And after a resonably long init phase (inconsistent state), I am now seeing these: 2014-02-28 01:05:12.344648 7fe9d05cb700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2014-02-28 01:05:12.345599 7f

Re: [ceph-users] Trying to rescue a lost quorum

2014-02-27 Thread Gregory Farnum
On Thu, Feb 27, 2014 at 4:25 PM, Marc wrote: > Hi, > > I was handed a Ceph cluster that had just lost quorum due to 2/3 mons > (b,c) running out of disk space (using up 15GB each). We were trying to > rescue this cluster without service downtime. As such we freed up some > space to keep mon b runn