Re: [ceph-users] ceph-deploy, single mon not in quorum

Mordur Ingolfsson Thu, 09 Jan 2014 08:16:15 -0800

Hi guys, thank you very much for your feedback.  I'm new to Ceph, so I
ask you to be patient with my newbie-ness.


I'm dealing with the same issue although I'm not using ceph-deploy. I
installed manually (for learning purposes) a small test cluster of three
nodes, one to host the single mon and two for osd. I had managed to get
this working, all seemed healthy. I then simulated a catastrophic event
by pulling the plug on all three nodes. After that I haven't been able
to get things working. There is no quorum reached on a single mon setup
and a ceph-create-keys process is hanging hanging. This is my ceph.conf


This is my ceph.conf

http://pastebin.com/qyqeu5E4


This is what a process list pertaining to ceph looks like on the mon
node after a reboot, please note that the ceph-create-keys hangs:

root@ceph0:/var/log/ceph# ps aux | grep ceph
root       988  0.2  0.2  34204  7368 ?        S    15:36   0:00
/usr/bin/python /usr/sbin/ceph-create-keys -i cehp0
root      1449  0.0  0.1  94844  3972 ?        Ss   15:38   0:00 sshd:
ceph [priv]  
ceph      1470  0.0  0.0  94844  1740 ?        S    15:38   0:00 sshd:
ceph@pts/0   
ceph      1471  0.3  0.1  22308  3384 pts/0    Ss   15:38   0:00 -bash
root      1670  0.0  0.0   9452   904 pts/0    R+   15:38   0:00 grep
--color=auto ceph

So as you can see, no mon  process is started, I presume that this is
somehow a result of the ceph-create-keys process hanging. 
/var/log/ceph-mon.cehp0.log shows the following in this status of the
system, after a reboot:

2014-01-09 15:49:44.433943 7f9e45eb97c0  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 972
2014-01-09 15:49:44.535436 7f9e45eb97c0 -1 failed to create new leveldb
store

If I manually start the ceph process by:

start ceph-mon id=ceph0

it starts fine, and "ceph
--admin-daemon=/var/run/ceph/ceph-mon.ceph0.asok  mon_status" outputs:

{ "name": "ceph0",
  "rank": 0,
  "state": "leader",
  "election_epoch": 1,
  "quorum": [
        0],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 1,
      "fsid": "e0696edf-ac8d-4095-beaf-6a2592964060",
      "modified": "2014-01-08 02:00:23.264895",
      "created": "2014-01-08 02:00:23.264895",
      "mons": [
            { "rank": 0,
              "name": "ceph0",
              "addr": "192.168.10.200:6789\/0"}]}}


The mon process seems ok, but the ceph-create-keys keeps hanging and
there is no quorum.  

If I kill the ceph-create-keys process and  run "/usr/bin/python
/usr/sbin/ceph-create-keys -i cehp0" manually i get:

"admin_socket: exception getting command descriptions: [Errno 2] No such
file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet."

every second or so. This is what happens when I terminate the manually
started ceph-create-keys process:

^CTraceback (most recent call last):
  File "/usr/sbin/ceph-create-keys", line 227, in <module>
    main()
  File "/usr/sbin/ceph-create-keys", line 213, in main
    wait_for_quorum(cluster=args.cluster, mon_id=args.id)
  File "/usr/sbin/ceph-create-keys", line 34, in wait_for_quorum
    time.sleep(1)
KeyboardInterrupt


I will finish this long post by pasting what happens if I try to restart
all services on the cluster, just so you know that the mon problem is
only the first problem I'm battling with here :)

http://pastebin.com/mPGhiYu5

Please note, that after the above global restart, the ceph-create-keys
hanging process is back.


Best,
Moe










On 01/09/2014 09:51 AM, Travis Rhoden wrote:
> On Thu, Jan 9, 2014 at 9:48 AM, Alfredo Deza <alfredo.d...@inktank.com> wrote:
>> On Thu, Jan 9, 2014 at 9:45 AM, Travis Rhoden <trho...@gmail.com> wrote:
>>> HI Mordur,
>>>
>>> I'm definitely straining my memory on this one, but happy to help if I can?
>>>
>>> I'm pretty sure I did not figure it out -- you can see I didn't get
>>> any feedback from the list.  What I did do, however, was uninstall
>>> everything and try the same setup with mkcephfs, which worked fine at
>>> the time.  This was 8 months ago, though, and I have since used
>>> ceph-deploy many times with great success.  I am not sure if I have
>>> ever tried a similar set up, though, with just one node and one
>>> monitor.  Fortuitiously, I may be trying that very setup today or
>>> tomorrow.  If I still have issues, I will be sure to post them here.
>>>
>>> Are you using both the latest ceph-deploy and the latest Ceph packages
>>> (Emperor or newer dev packages)?  There have been lots of changes in
>>> the monitor area, including in the upstart scripts, that made many
>>> things more robust in this area.  I did have a cluster a few months
>>> ago that had a flaky monitor that refused to join quorum after
>>> install, and I had to just blow it away and re-install/deploy it and
>>> then it was fine, which I thought was odd.
>>>
>>> Sorry that's probably not much help.
>>>
>>>  - Travis
>>>
>>> On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson <r...@1984.is> wrote:
>>>> Hi Travis,
>>>>
>>>> Did you figure this out? I'm dealing with exactly the same thing over here.
>> Can you share what exactly you are having problems with? ceph-deploy's
>> log output has been
>> much improved and it is super useful to have that when dealing with
>> possible issues.
> I do not, it was long long ago...  And it case it was ambiguous, let
> me explicitly say I was not recommending the use of mkcephfs at all
> (is that even still possible?).  ceph-deploy is certainly the tool to
> use.
>
>>>> Best,
>>>> Moe
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy, single mon not in quorum

Reply via email to