[ceph-users] About single monitor recovery
I run a tiny ceph cluster with only one monitor. After a reboot the system, the monitor refuse to start. I try to start ceph-mon manually with command 'ceph -f -i a', below is first few lines of the output: starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data /var/lib/ceph/mon/ceph-a fsid 554bee60-9602-4017-a6e1-ceb6907a218c mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time 2013-08-03 20:27:29.208156 mon/AuthMonitor.cc: 147: FAILED assert(ret == 0) The full log is at: https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 So, are there any way to make the monitor work again? I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01, and success start the monitor with these files, but rados and other command not work because osd keep saying the monitor is the wrong node(that's right, it's actually the node 2 days ago). -- Best regards, Changyuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] About single monitor recovery
On Sat, 3 Aug 2013, Yu Changyuan wrote: > I run a tiny ceph cluster with only one monitor. After a reboot the system, > the monitor refuse to start. > I try to start ceph-mon manually with command 'ceph -f -i a', below is > first few lines of the output: > > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data > /var/lib/ceph/mon/ceph-a fsid 554bee60-9602-4017-a6e1-ceb6907a218c > mon/AuthMonitor.cc: In function 'virtual void > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time 2013-08-03 > 20:27:29.208156 > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0) > > The full log is at: https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 This is 0.61.1. Can you try again with 0.61.7 to rule out anything there? > So, are there any way to make the monitor work again? > > I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01, and success > start the monitor with these files, > but rados and other command not work because osd keep saying the monitor is > the wrong node(that's right, it's actually the node 2 days ago). In general that is not going to work well as the cluster does not like to warp back in time. If it does not start with .7 (I suspect it won't), can you send us a tarball of the mon data directory so we can see what is awry? sage___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] trouble authenticating after bootstrapping monitors
On Fri, 2 Aug 2013, Kevin Weiler wrote: > I'm having some trouble bootstrapping my monitors using this page as a > guide: > > http://ceph.com/docs/next/dev/mon-bootstrap/ > > I can't seem to authenticate to my monitors with client.admin after I've > created them and started them: > You also need > [root@camelot ~]# cat /etc/ceph/ceph.keyring > [mon.] > key = AQD6yftRkKY3NxAA5VNbtUM23C3uPqUUXYSHeQ== caps mon = allow * > [client.admin] > key = AQANyvtRYDHCCxAAwgcgdMJ9ue64m6+enYONOw== caps mon = allow * so that the mon knows these users are allowed to do everything. sage > > [root@camelot ~]# monmaptool --create --add camelot 10.198.1.3:6789 monmap > monmaptool: monmap file monmap > monmaptool: generated fsid 87a5f355-f7be-43aa-b26c-b6ad23f371bb > monmaptool: writing epoch 0 to monmap (1 monitors) > > [root@camelot ~]# ceph-mon --mkfs -i camelot --monmap monmap --keyring /etc/ > ceph/ceph.keyring > ceph-mon: created monfs at /srv/mon.camelot for mon.camelot > > [root@camelot ~]# service ceph start > === mon.camelot === > Starting Ceph mon.camelot on camelot... > === mds.camelot === > Starting Ceph mds.camelot on camelot... > starting mds.camelot at :/0 > > [root@camelot ~]# ceph auth get mon. > access denied > > If someone could tell me what I'm doing wrong it would be greatly > appreciated. Thanks! > > > -- > > Kevin Weiler > > IT > > > > IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 > | http://imc-chicago.com/ > > Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | > E-Mail: kevin.wei...@imc-chicago.com > > > > > The information in this e-mail is intended only for the person or entity to > which it is addressed. > > It may contain confidential and /or privileged material. If someone other > than the intended recipient should receive this e-mail, he / she shall not > be entitled to read, disseminate, disclose or duplicate it. > > If you receive this e-mail unintentionally, please inform us immediately by > "reply" and then delete it from your system. Although this information has > been compiled with great care, neither IMC Financial Markets & Asset > Management nor any of its related entities shall accept any responsibility > for any errors, omissions or other inaccuracies in this information or for > the consequences thereof, nor shall it be bound in any way by the contents > of this e-mail or its attachments. In the event of incomplete or incorrect > transmission, please return the e-mail to the sender and permanently delete > this message and any attachments. > > Messages and attachments are scanned for all known viruses. Always scan > attachments before opening them. > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] About single monitor recovery
On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil wrote: > On Sat, 3 Aug 2013, Yu Changyuan wrote: > > I run a tiny ceph cluster with only one monitor. After a reboot the > system, > > the monitor refuse to start. > > I try to start ceph-mon manually with command 'ceph -f -i a', below is > > first few lines of the output: > > > > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data > > /var/lib/ceph/mon/ceph-a fsid 554bee60-9602-4017-a6e1-ceb6907a218c > > mon/AuthMonitor.cc: In function 'virtual void > > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time 2013-08-03 > > 20:27:29.208156 > > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0) > > > > The full log is at: > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 > > This is 0.61.1. Can you try again with 0.61.7 to rule out anything there? > I just tried 0.61.7, still out of luck. Here is the log: https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243 > > So, are there any way to make the monitor work again? > > > > I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01, and success > > start the monitor with these files, > > but rados and other command not work because osd keep saying the monitor > is > > the wrong node(that's right, it's actually the node 2 days ago). > > In general that is not going to work well as the cluster does not like to > warp back in time. If it does not start with .7 (I suspect it won't), can > you send us a tarball of the mon data directory so we can see what is > awry? OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly. > > sage -- Best regards, Changyuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] About single monitor recovery
It looks like the auth state wasn't trimmed properly. It also sort of looks like you aren't using authentication on this cluster... is that true? (The keyring file was empty.) This looks like a trim issue, but I don't remember what all we fixed since .1.. that was a while ago! We certainly haven't seen anything like this recently. I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing incrementals and will get your mon up, but you may lose some auth keys. If auth is on, you'll need ot add them back again. If not, it may just work with this. You can grab the packages from http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish or whatever the right dir is for your distro when they appear in about 15 minutes. Let me know if that resolves it. sage On Sun, 4 Aug 2013, Yu Changyuan wrote: > > > > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil wrote: > On Sat, 3 Aug 2013, Yu Changyuan wrote: > > I run a tiny ceph cluster with only one monitor. After a > reboot the system, > > the monitor refuse to start. > > I try to start ceph-mon manually with command 'ceph -f -i a', > below is > > first few lines of the output: > > > > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data > > /var/lib/ceph/mon/ceph-a fsid > 554bee60-9602-4017-a6e1-ceb6907a218c > > mon/AuthMonitor.cc: In function 'virtual void > > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time > 2013-08-03 > > 20:27:29.208156 > > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0) > > > > The full log is at: > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 > > This is 0.61.1. Can you try again with 0.61.7 to rule out anything > there? > > > I just tried 0.61.7, still out of luck. Here is the log: > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243 > > > > So, are there any way to make the monitor work again? > > > > I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01, > and success > > start the monitor with these files, > > but rados and other command not work because osd keep saying > the monitor is > > the wrong node(that's right, it's actually the node 2 days > ago). > > In general that is not going to work well as the cluster does not like > to > warp back in time. If it does not start with .7 (I suspect it won't), > can > you send us a tarball of the mon data directory so we can see what is > awry? > > > OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly. > > > sage > > > > > -- > Best regards, > Changyuan > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] About single monitor recovery
On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil wrote: > It looks like the auth state wasn't trimmed properly. It also sort of > looks like you aren't using authentication on this cluster... is that > true? (The keyring file was empty.) > > Yes, your're right, I disable auth. It's just a personal cluster, so the simpler the better. This looks like a trim issue, but I don't remember what all we fixed since > .1.. that was a while ago! We certainly haven't seen anything like this > recently. > > I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing > incrementals and will get your mon up, but you may lose some auth keys. > If auth is on, you'll need ot add them back again. If not, it may just > work with this. > > You can grab the packages from > > > http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish > > or whatever the right dir is for your distro when they appear in about 15 > minutes. Let me know if that resolves it. > Thank you for your work, I will try as soon as possible. PS: My distro is Gentoo, so maybe I should build from source directly. > > sage > > > On Sun, 4 Aug 2013, Yu Changyuan wrote: > > > > > > > > > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil wrote: > > On Sat, 3 Aug 2013, Yu Changyuan wrote: > > > I run a tiny ceph cluster with only one monitor. After a > > reboot the system, > > > the monitor refuse to start. > > > I try to start ceph-mon manually with command 'ceph -f -i a', > >below is > > > first few lines of the output: > > > > > > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data > > > /var/lib/ceph/mon/ceph-a fsid > > 554bee60-9602-4017-a6e1-ceb6907a218c > > > mon/AuthMonitor.cc: In function 'virtual void > > > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time > > 2013-08-03 > > > 20:27:29.208156 > > > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0) > > > > > > The full log is at: > > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 > > > > This is 0.61.1. Can you try again with 0.61.7 to rule out anything > > there? > > > > > > I just tried 0.61.7, still out of luck. Here is the log: > > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243 > > > > > > > So, are there any way to make the monitor work again? > > > > > > I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01, > > and success > > > start the monitor with these files, > > > but rados and other command not work because osd keep saying > > the monitor is > > > the wrong node(that's right, it's actually the node 2 days > > ago). > > > > In general that is not going to work well as the cluster does not like > > to > > warp back in time. If it does not start with .7 (I suspect it won't), > > can > > you send us a tarball of the mon data directory so we can see what is > > awry? > > > > > > OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly. > > > > > > sage > > > > > > > > > > -- > > Best regards, > > Changyuan > > > > > -- Best regards, Changyuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com