Hi guys, thank you very much for your feedback. I'm new to Ceph, so I ask you to be patient with my newbie-ness.
I'm dealing with the same issue although I'm not using ceph-deploy. I installed manually (for learning purposes) a small test cluster of three nodes, one to host the single mon and two for osd. I had managed to get this working, all seemed healthy. I then simulated a catastrophic event by pulling the plug on all three nodes. After that I haven't been able to get things working. There is no quorum reached on a single mon setup and a ceph-create-keys process is hanging hanging. This is my ceph.conf This is my ceph.conf http://pastebin.com/qyqeu5E4 This is what a process list pertaining to ceph looks like on the mon node after a reboot, please note that the ceph-create-keys hangs: root@ceph0:/var/log/ceph# ps aux | grep ceph root 988 0.2 0.2 34204 7368 ? S 15:36 0:00 /usr/bin/python /usr/sbin/ceph-create-keys -i cehp0 root 1449 0.0 0.1 94844 3972 ? Ss 15:38 0:00 sshd: ceph [priv] ceph 1470 0.0 0.0 94844 1740 ? S 15:38 0:00 sshd: ceph@pts/0 ceph 1471 0.3 0.1 22308 3384 pts/0 Ss 15:38 0:00 -bash root 1670 0.0 0.0 9452 904 pts/0 R+ 15:38 0:00 grep --color=auto ceph So as you can see, no mon process is started, I presume that this is somehow a result of the ceph-create-keys process hanging. /var/log/ceph-mon.cehp0.log shows the following in this status of the system, after a reboot: 2014-01-09 15:49:44.433943 7f9e45eb97c0 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 972 2014-01-09 15:49:44.535436 7f9e45eb97c0 -1 failed to create new leveldb store If I manually start the ceph process by: start ceph-mon id=ceph0 it starts fine, and "ceph --admin-daemon=/var/run/ceph/ceph-mon.ceph0.asok mon_status" outputs: { "name": "ceph0", "rank": 0, "state": "leader", "election_epoch": 1, "quorum": [ 0], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 1, "fsid": "e0696edf-ac8d-4095-beaf-6a2592964060", "modified": "2014-01-08 02:00:23.264895", "created": "2014-01-08 02:00:23.264895", "mons": [ { "rank": 0, "name": "ceph0", "addr": "192.168.10.200:6789\/0"}]}} The mon process seems ok, but the ceph-create-keys keeps hanging and there is no quorum. If I kill the ceph-create-keys process and run "/usr/bin/python /usr/sbin/ceph-create-keys -i cehp0" manually i get: "admin_socket: exception getting command descriptions: [Errno 2] No such file or directory INFO:ceph-create-keys:ceph-mon admin socket not ready yet." every second or so. This is what happens when I terminate the manually started ceph-create-keys process: ^CTraceback (most recent call last): File "/usr/sbin/ceph-create-keys", line 227, in <module> main() File "/usr/sbin/ceph-create-keys", line 213, in main wait_for_quorum(cluster=args.cluster, mon_id=args.id) File "/usr/sbin/ceph-create-keys", line 34, in wait_for_quorum time.sleep(1) KeyboardInterrupt I will finish this long post by pasting what happens if I try to restart all services on the cluster, just so you know that the mon problem is only the first problem I'm battling with here :) http://pastebin.com/mPGhiYu5 Please note, that after the above global restart, the ceph-create-keys hanging process is back. Best, Moe On 01/09/2014 09:51 AM, Travis Rhoden wrote: > On Thu, Jan 9, 2014 at 9:48 AM, Alfredo Deza <alfredo.d...@inktank.com> wrote: >> On Thu, Jan 9, 2014 at 9:45 AM, Travis Rhoden <trho...@gmail.com> wrote: >>> HI Mordur, >>> >>> I'm definitely straining my memory on this one, but happy to help if I can? >>> >>> I'm pretty sure I did not figure it out -- you can see I didn't get >>> any feedback from the list. What I did do, however, was uninstall >>> everything and try the same setup with mkcephfs, which worked fine at >>> the time. This was 8 months ago, though, and I have since used >>> ceph-deploy many times with great success. I am not sure if I have >>> ever tried a similar set up, though, with just one node and one >>> monitor. Fortuitiously, I may be trying that very setup today or >>> tomorrow. If I still have issues, I will be sure to post them here. >>> >>> Are you using both the latest ceph-deploy and the latest Ceph packages >>> (Emperor or newer dev packages)? There have been lots of changes in >>> the monitor area, including in the upstart scripts, that made many >>> things more robust in this area. I did have a cluster a few months >>> ago that had a flaky monitor that refused to join quorum after >>> install, and I had to just blow it away and re-install/deploy it and >>> then it was fine, which I thought was odd. >>> >>> Sorry that's probably not much help. >>> >>> - Travis >>> >>> On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson <r...@1984.is> wrote: >>>> Hi Travis, >>>> >>>> Did you figure this out? I'm dealing with exactly the same thing over here. >> Can you share what exactly you are having problems with? ceph-deploy's >> log output has been >> much improved and it is super useful to have that when dealing with >> possible issues. > I do not, it was long long ago... And it case it was ambiguous, let > me explicitly say I was not recommending the use of mkcephfs at all > (is that even still possible?). ceph-deploy is certainly the tool to > use. > >>>> Best, >>>> Moe >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com