Re: [ceph-users] Mons deadlocked after they all died

Marc Tue, 29 Apr 2014 15:29:25 -0700

Thank you for the help so far! I went for option 1 and that did solve
that problem. However quorum has not been restored. Here's the
information I can get:


mon a+b are in state Electing and have been for more than 2 hours now.
mon c does reply to "help" by using the socket, but it does not respond
to mon_status nor sync_status (even though help lists them, so they
should be available). The logs of mon.c show a loop that contains

peer paxos version 15329444 vs my version 0 (too far ahead)

("full" log at the end of this mail). I thought maybe mon.c could use a
monmap update, but since theres no quorum a monmap is hard to come by. I
tried stopping a running monitor and then ran did this:

ceph-mon -i a --extract-monmap /tmp/monmap
too many arguments: [--extract-monmap,/tmp/monmap]
usage: ceph-mon -i monid [--mon-data=pathtodata] [flags]
*snip*

So I guess that command was not available in that version. Is there a
way to get a+b out of being stuck in election mode without having to
upgrade them first? Similarily, can I somehow obtain a monmap for mon.c
that has the same epoc as the other nodes (or does that not matter?)? I
thought about creating a "new" but identical monmap with monmaptool
(giving it the fsid and all that) but that gives me an e0 monmap.

Lastly I thought about copying the store.db from another mon as you had
also mentioned, but I seem to be unable to find information regarding
this procedure in this mailing lists archives about which files I'd need
to tinker with.

Logs of mon.c:

2014-04-29 21:15:48.808835 b4045b40 10 mon.c@2(probing) e16  monmap is
e16: 4 mons at
{a=X.Y.Z.201:6789/0,b=X.Y.Z.202:6789/0,c=X.Y.Z.203:6789/0,g=X.Y.Z.207:6789/0}
2014-04-29 21:15:48.808852 b4045b40 10 mon.c@2(probing) e16  peer name is b
2014-04-29 21:15:48.808856 b4045b40 10 mon.c@2(probing) e16  mon.b is
outside the quorum
2014-04-29 21:15:48.808860 b4045b40 10 mon.c@2(probing) e16  peer paxos
version 15329444 vs my version 0 (too far ahead)
2014-04-29 21:15:48.808867 b4045b40 10 mon.c@2(probing) e16
cancel_probe_timeout 0x9867440
2014-04-29 21:15:48.808874 b4045b40 10 mon.c@2(probing) e16 sync_start
entity( mon.1 X.Y.Z.202:6789/0 )
2014-04-29 21:30:48.908736 b2b3eb40  0 -- X.Y.Z.203:6789/0 >>
X.Y.Z.202:6789/0 pipe(0x9d10000 sd=130 :48366 s=2 pgs=111647 cs=1
l=0).fault with nothing to send, going to standby
2014-04-29 21:36:18.969865 b4045b40 10 mon.c@2(synchronizing sync(
requester state start )).monmap v16 get_monmap ver 0
2014-04-29 21:41:11.823272 b4045b40 10 mon.c@2(synchronizing sync(
requester state start )) e16 sync_store_init backup current monmap
2014-04-29 21:41:11.875212 b4846b40 11 mon.c@2(synchronizing sync(
requester state start )) e16 tick
2014-04-29 21:41:11.875629 b2437b40 10 mon.c@2(synchronizing sync(
requester state start )) e16 ms_get_authorizer for mon
2014-04-29 21:46:37.729355 b4846b40 10 mon.c@2(synchronizing sync(
requester state start )).data_health(0) service_tick
2014-04-29 21:46:37.729413 b4846b40  0 mon.c@2(synchronizing sync(
requester state start )).data_health(0) update_stats avail 6% total
17169816 used 15157420 avail 1133548
2014-04-29 21:46:37.729460 b4846b40  0 log [WRN] : reached concerning
levels of available space on data store (6% free)
2014-04-29 21:46:37.729542 b4846b40 10 mon.c@2(synchronizing sync(
requester state start )) e16 sync_start_reply_timeout
2014-04-29 21:46:37.729553 b4846b40 10 mon.c@2(synchronizing sync(
requester state start )) e16 sync_requester_abort mon.1 X.Y.Z.202:6789/0
mon.1 X.Y.Z.202:6789/0 clearing potentially inconsistent store
2014-04-29 22:01:37.828974 b2b3eb40  0 -- X.Y.Z.203:6789/0 >>
X.Y.Z.202:6789/0 pipe(0x9d10000 sd=130 :51116 s=2 pgs=116913 cs=3
l=0).fault with nothing to send, going to standby
2014-04-29 22:01:51.856866 b4846b40  1 mon.c@2(synchronizing sync(
requester state start )) e16 sync_requester_abort no longer a sync requester

Please note: even though the logs are showing a warning about disk space
for the mon, that should not be an issue since I have set the mon full
ratio to .98 (temporarily!).

Also, the store.db size is about 22GB. Is that big enough to maybe
trigger "all sorts of funky timeouts" like was mentioned on this mailing
list regarding a store.db with ~200GB


On 29/04/2014 19:05, Gregory Farnum wrote:
> On Tue, Apr 29, 2014 at 9:48 AM, Marc <m...@shoowin.de> wrote:
>> 'ls' on the respective stores in /var/lib/ceph/mon/ceph.X/store.db
>> returns a list of files (i.e. still present), fsck seems fine. I did
>> notice that one of the nodes has different contents in the
>> /var/lib/ceph/mon/ceph-b/keyring i.e. its key is different from the
>> other 2 nodes'. That shouldn't be the case, should it? Would scp'ing
>> over one of the other node's keyring files while mon.b is stopped be the
>> right course of action then?
> The fact that it's changed is...concerning. If that's the only thing
> that's changed then copying over a keyring from one of the others
> should do it, but it might also be a symptom of a more serious issue.
> Depending on how paranoid you want to be:
> 1) just copy over the keyring and start it up
> 2) after that, do a mon scrub if it exists in your version of ceph (I
> don't remember when it was introduced)
> 3) Prior to that, do a comparison of the information you can pull out
> of each monitor's admin socket when it's trying to form a quorum; make
> sure everything basically matches
> 4) Prior to changing the keys, you could extract several maps of
> various types and compare them to make sure they match
> 5) Or you could just copy one of the working stores to the monitor
> with a different key. (There might be some files you need to twiddle
> when doing this; check for past emails about recovering from lost
> monitors.)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>>
>> Also your red herring explanation... how do I put this... It seems like
>> an important thing to know, so thanks for that. I'm not sure how one
>> would go about putting this tidbit in a spot where people would find it
>> when needed... maybe somewhere in the debugging section of the wiki?
>>
>> On 29/04/2014 18:25, Gregory Farnum wrote:
>>> Monitor keys don't change; I think something else must be going on. Did you
>>> remove any of their stores? Are the local filesystems actually correct
>>> (fsck)?
>>>
>>> The ceph-create-keys is a red herring and will stop as soon as. The
>>> monitors do get into a quorum.
>>> -Greg
>>>
>>> On Tuesday, April 29, 2014, Marc <m...@shoowin.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> still working on a troubled ceph cluster running .61.2-1raring
>>>> consisting of (currently) 4 monitors a,b,c,g with g being a newly added
>>>> monitor that failed/fails to sync up, so consider that one down. Now mon
>>>> a and b died because for some (currently unknown) reason linux created a
>>>> core dump on the root partition (/core) that filled up the partition to
>>>> 0b left and consequently the mons died. Now I tried restarting them, but
>>>> they they seem deadlocked in the following situation:
>>>>
>>>> the corresponding ceph-mon.X logs show various errors about cephx like
>>>>
>>>> "cephx: verify_authorizer could not decrypt ticket info: error: NSS AES
>>>> final round failed: -8190"
>>>>
>>>> "cephx: verify_reply coudln't decrypt with error: error decoding block
>>>> for decryption"
>>>>
>>>> I can see that the /usr/sbin/ceph-create-keys process is stuck (based on
>>>> the fact that its still running 20 minutes later). Manually running this
>>>> says:
>>>>
>>>>
>>>> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
>>>>
>>>>
>>>>
>>>> So, the monitors dont start up (stuck probing) because they cant
>>>> communicate because they need new keys, and the keys cannot be generated
>>>> because theres no quorum. Is there a way to fix this?
>>>>
>>>>
>>>> Kind regards,
>>>> Marc
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <javascript:;>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mons deadlocked after they all died

Reply via email to