Thank you for the help so far! I went for option 1 and that did solve that problem. However quorum has not been restored. Here's the information I can get:
mon a+b are in state Electing and have been for more than 2 hours now. mon c does reply to "help" by using the socket, but it does not respond to mon_status nor sync_status (even though help lists them, so they should be available). The logs of mon.c show a loop that contains peer paxos version 15329444 vs my version 0 (too far ahead) ("full" log at the end of this mail). I thought maybe mon.c could use a monmap update, but since theres no quorum a monmap is hard to come by. I tried stopping a running monitor and then ran did this: ceph-mon -i a --extract-monmap /tmp/monmap too many arguments: [--extract-monmap,/tmp/monmap] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags] *snip* So I guess that command was not available in that version. Is there a way to get a+b out of being stuck in election mode without having to upgrade them first? Similarily, can I somehow obtain a monmap for mon.c that has the same epoc as the other nodes (or does that not matter?)? I thought about creating a "new" but identical monmap with monmaptool (giving it the fsid and all that) but that gives me an e0 monmap. Lastly I thought about copying the store.db from another mon as you had also mentioned, but I seem to be unable to find information regarding this procedure in this mailing lists archives about which files I'd need to tinker with. Logs of mon.c: 2014-04-29 21:15:48.808835 b4045b40 10 mon.c@2(probing) e16 monmap is e16: 4 mons at {a=X.Y.Z.201:6789/0,b=X.Y.Z.202:6789/0,c=X.Y.Z.203:6789/0,g=X.Y.Z.207:6789/0} 2014-04-29 21:15:48.808852 b4045b40 10 mon.c@2(probing) e16 peer name is b 2014-04-29 21:15:48.808856 b4045b40 10 mon.c@2(probing) e16 mon.b is outside the quorum 2014-04-29 21:15:48.808860 b4045b40 10 mon.c@2(probing) e16 peer paxos version 15329444 vs my version 0 (too far ahead) 2014-04-29 21:15:48.808867 b4045b40 10 mon.c@2(probing) e16 cancel_probe_timeout 0x9867440 2014-04-29 21:15:48.808874 b4045b40 10 mon.c@2(probing) e16 sync_start entity( mon.1 X.Y.Z.202:6789/0 ) 2014-04-29 21:30:48.908736 b2b3eb40 0 -- X.Y.Z.203:6789/0 >> X.Y.Z.202:6789/0 pipe(0x9d10000 sd=130 :48366 s=2 pgs=111647 cs=1 l=0).fault with nothing to send, going to standby 2014-04-29 21:36:18.969865 b4045b40 10 mon.c@2(synchronizing sync( requester state start )).monmap v16 get_monmap ver 0 2014-04-29 21:41:11.823272 b4045b40 10 mon.c@2(synchronizing sync( requester state start )) e16 sync_store_init backup current monmap 2014-04-29 21:41:11.875212 b4846b40 11 mon.c@2(synchronizing sync( requester state start )) e16 tick 2014-04-29 21:41:11.875629 b2437b40 10 mon.c@2(synchronizing sync( requester state start )) e16 ms_get_authorizer for mon 2014-04-29 21:46:37.729355 b4846b40 10 mon.c@2(synchronizing sync( requester state start )).data_health(0) service_tick 2014-04-29 21:46:37.729413 b4846b40 0 mon.c@2(synchronizing sync( requester state start )).data_health(0) update_stats avail 6% total 17169816 used 15157420 avail 1133548 2014-04-29 21:46:37.729460 b4846b40 0 log [WRN] : reached concerning levels of available space on data store (6% free) 2014-04-29 21:46:37.729542 b4846b40 10 mon.c@2(synchronizing sync( requester state start )) e16 sync_start_reply_timeout 2014-04-29 21:46:37.729553 b4846b40 10 mon.c@2(synchronizing sync( requester state start )) e16 sync_requester_abort mon.1 X.Y.Z.202:6789/0 mon.1 X.Y.Z.202:6789/0 clearing potentially inconsistent store 2014-04-29 22:01:37.828974 b2b3eb40 0 -- X.Y.Z.203:6789/0 >> X.Y.Z.202:6789/0 pipe(0x9d10000 sd=130 :51116 s=2 pgs=116913 cs=3 l=0).fault with nothing to send, going to standby 2014-04-29 22:01:51.856866 b4846b40 1 mon.c@2(synchronizing sync( requester state start )) e16 sync_requester_abort no longer a sync requester Please note: even though the logs are showing a warning about disk space for the mon, that should not be an issue since I have set the mon full ratio to .98 (temporarily!). Also, the store.db size is about 22GB. Is that big enough to maybe trigger "all sorts of funky timeouts" like was mentioned on this mailing list regarding a store.db with ~200GB On 29/04/2014 19:05, Gregory Farnum wrote: > On Tue, Apr 29, 2014 at 9:48 AM, Marc <m...@shoowin.de> wrote: >> 'ls' on the respective stores in /var/lib/ceph/mon/ceph.X/store.db >> returns a list of files (i.e. still present), fsck seems fine. I did >> notice that one of the nodes has different contents in the >> /var/lib/ceph/mon/ceph-b/keyring i.e. its key is different from the >> other 2 nodes'. That shouldn't be the case, should it? Would scp'ing >> over one of the other node's keyring files while mon.b is stopped be the >> right course of action then? > The fact that it's changed is...concerning. If that's the only thing > that's changed then copying over a keyring from one of the others > should do it, but it might also be a symptom of a more serious issue. > Depending on how paranoid you want to be: > 1) just copy over the keyring and start it up > 2) after that, do a mon scrub if it exists in your version of ceph (I > don't remember when it was introduced) > 3) Prior to that, do a comparison of the information you can pull out > of each monitor's admin socket when it's trying to form a quorum; make > sure everything basically matches > 4) Prior to changing the keys, you could extract several maps of > various types and compare them to make sure they match > 5) Or you could just copy one of the working stores to the monitor > with a different key. (There might be some files you need to twiddle > when doing this; check for past emails about recovering from lost > monitors.) > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > >> >> Also your red herring explanation... how do I put this... It seems like >> an important thing to know, so thanks for that. I'm not sure how one >> would go about putting this tidbit in a spot where people would find it >> when needed... maybe somewhere in the debugging section of the wiki? >> >> On 29/04/2014 18:25, Gregory Farnum wrote: >>> Monitor keys don't change; I think something else must be going on. Did you >>> remove any of their stores? Are the local filesystems actually correct >>> (fsck)? >>> >>> The ceph-create-keys is a red herring and will stop as soon as. The >>> monitors do get into a quorum. >>> -Greg >>> >>> On Tuesday, April 29, 2014, Marc <m...@shoowin.de> wrote: >>> >>>> Hi, >>>> >>>> still working on a troubled ceph cluster running .61.2-1raring >>>> consisting of (currently) 4 monitors a,b,c,g with g being a newly added >>>> monitor that failed/fails to sync up, so consider that one down. Now mon >>>> a and b died because for some (currently unknown) reason linux created a >>>> core dump on the root partition (/core) that filled up the partition to >>>> 0b left and consequently the mons died. Now I tried restarting them, but >>>> they they seem deadlocked in the following situation: >>>> >>>> the corresponding ceph-mon.X logs show various errors about cephx like >>>> >>>> "cephx: verify_authorizer could not decrypt ticket info: error: NSS AES >>>> final round failed: -8190" >>>> >>>> "cephx: verify_reply coudln't decrypt with error: error decoding block >>>> for decryption" >>>> >>>> I can see that the /usr/sbin/ceph-create-keys process is stuck (based on >>>> the fact that its still running 20 minutes later). Manually running this >>>> says: >>>> >>>> >>>> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing' >>>> >>>> >>>> >>>> So, the monitors dont start up (stuck probing) because they cant >>>> communicate because they need new keys, and the keys cannot be generated >>>> because theres no quorum. Is there a way to fix this? >>>> >>>> >>>> Kind regards, >>>> Marc >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com <javascript:;> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com