[ceph-users] About single monitor recovery

2013-08-03 Thread Yu Changyuan
I run a tiny ceph cluster with only one monitor. After a reboot the system,
the monitor refuse to start.
I try to start ceph-mon manually with command 'ceph -f -i a',  below is
first few lines of the output:

starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
/var/lib/ceph/mon/ceph-a fsid 554bee60-9602-4017-a6e1-ceb6907a218c
mon/AuthMonitor.cc: In function 'virtual void
AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time 2013-08-03
20:27:29.208156
mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)

The full log is at: https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8

So, are there any way to make the monitor work again?

I have a backup of /var/lib/ceph/mon/ceph-a  in 2013-08-01, and success
start the monitor with these files,
but rados and other command not work because osd keep saying the monitor is
the wrong node(that's right, it's actually the node 2 days ago).

--
Best regards,
Changyuan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About single monitor recovery

2013-08-03 Thread Sage Weil
On Sat, 3 Aug 2013, Yu Changyuan wrote:
> I run a tiny ceph cluster with only one monitor. After a reboot the system,
> the monitor refuse to start.
> I try to start ceph-mon manually with command 'ceph -f -i a',  below is
> first few lines of the output:
> 
> starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
> /var/lib/ceph/mon/ceph-a fsid 554bee60-9602-4017-a6e1-ceb6907a218c
> mon/AuthMonitor.cc: In function 'virtual void
> AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time 2013-08-03
> 20:27:29.208156
> mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)
> 
> The full log is at: https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8

This is 0.61.1.  Can you try again with 0.61.7 to rule out anything there?

> So, are there any way to make the monitor work again?
> 
> I have a backup of /var/lib/ceph/mon/ceph-a  in 2013-08-01, and success
> start the monitor with these files,
> but rados and other command not work because osd keep saying the monitor is
> the wrong node(that's right, it's actually the node 2 days ago).

In general that is not going to work well as the cluster does not like to 
warp back in time.  If it does not start with .7 (I suspect it won't), can 
you send us a tarball of the mon data directory so we can see what is 
awry?

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble authenticating after bootstrapping monitors

2013-08-03 Thread Sage Weil
On Fri, 2 Aug 2013, Kevin Weiler wrote:
> I'm having some trouble bootstrapping my monitors using this page as a
> guide:
> 
> http://ceph.com/docs/next/dev/mon-bootstrap/
> 
> I can't seem to authenticate to my monitors with client.admin after I've
> created them and started them:
> 

You also need

> [root@camelot ~]# cat /etc/ceph/ceph.keyring
> [mon.]
>   key = AQD6yftRkKY3NxAA5VNbtUM23C3uPqUUXYSHeQ==
caps mon = allow *

> [client.admin]
>   key = AQANyvtRYDHCCxAAwgcgdMJ9ue64m6+enYONOw==
caps mon = allow *

so that the mon knows these users are allowed to do everything.

sage

> 
> [root@camelot ~]# monmaptool --create --add camelot 10.198.1.3:6789 monmap
> monmaptool: monmap file monmap
> monmaptool: generated fsid 87a5f355-f7be-43aa-b26c-b6ad23f371bb
> monmaptool: writing epoch 0 to monmap (1 monitors)
> 
> [root@camelot ~]# ceph-mon --mkfs -i camelot --monmap monmap --keyring /etc/
> ceph/ceph.keyring
> ceph-mon: created monfs at /srv/mon.camelot for mon.camelot
> 
> [root@camelot ~]# service ceph start
> === mon.camelot ===
> Starting Ceph mon.camelot on camelot...
> === mds.camelot ===
> Starting Ceph mds.camelot on camelot...
> starting mds.camelot at :/0
> 
> [root@camelot ~]# ceph auth get mon.
> access denied
> 
> If someone could tell me what I'm doing wrong it would be greatly
> appreciated. Thanks!
> 
> 
> -- 
> 
> Kevin Weiler
> 
> IT
> 
>  
> 
> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606
> | http://imc-chicago.com/
> 
> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 |
> E-Mail: kevin.wei...@imc-chicago.com
> 
> 
> 
> 
> The information in this e-mail is intended only for the person or entity to
> which it is addressed.
> 
> It may contain confidential and /or privileged material. If someone other
> than the intended recipient should receive this e-mail, he / she shall not
> be entitled to read, disseminate, disclose or duplicate it.
> 
> If you receive this e-mail unintentionally, please inform us immediately by
> "reply" and then delete it from your system. Although this information has
> been compiled with great care, neither IMC Financial Markets & Asset
> Management nor any of its related entities shall accept any responsibility
> for any errors, omissions or other inaccuracies in this information or for
> the consequences thereof, nor shall it be bound in any way by the contents
> of this e-mail or its attachments. In the event of incomplete or incorrect
> transmission, please return the e-mail to the sender and permanently delete
> this message and any attachments.
> 
> Messages and attachments are scanned for all known viruses. Always scan
> attachments before opening them.
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About single monitor recovery

2013-08-03 Thread Yu Changyuan
On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil  wrote:

> On Sat, 3 Aug 2013, Yu Changyuan wrote:
> > I run a tiny ceph cluster with only one monitor. After a reboot the
> system,
> > the monitor refuse to start.
> > I try to start ceph-mon manually with command 'ceph -f -i a',  below is
> > first few lines of the output:
> >
> > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
> > /var/lib/ceph/mon/ceph-a fsid 554bee60-9602-4017-a6e1-ceb6907a218c
> > mon/AuthMonitor.cc: In function 'virtual void
> > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time 2013-08-03
> > 20:27:29.208156
> > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)
> >
> > The full log is at:
> https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
>
> This is 0.61.1.  Can you try again with 0.61.7 to rule out anything there?
>

I just tried 0.61.7, still out of luck. Here is the log:
https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243


> > So, are there any way to make the monitor work again?
> >
> > I have a backup of /var/lib/ceph/mon/ceph-a  in 2013-08-01, and success
> > start the monitor with these files,
> > but rados and other command not work because osd keep saying the monitor
> is
> > the wrong node(that's right, it's actually the node 2 days ago).
>
> In general that is not going to work well as the cluster does not like to
> warp back in time.  If it does not start with .7 (I suspect it won't), can
> you send us a tarball of the mon data directory so we can see what is
> awry?


OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly.


>
> sage




-- 
Best regards,
Changyuan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About single monitor recovery

2013-08-03 Thread Sage Weil
It looks like the auth state wasn't trimmed properly.  It also sort of 
looks like you aren't using authentication on this cluster... is that 
true?  (The keyring file was empty.)

This looks like a trim issue, but I don't remember what all we fixed since 
.1.. that was a while ago!  We certainly haven't seen anything like this 
recently.

I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing 
incrementals and will get your mon up, but you may lose some auth keys.  
If auth is on, you'll need ot add them back again.  If not, it may just 
work with this.

You can grab the packages from

 
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish

or whatever the right dir is for your distro when they appear in about 15 
minutes.  Let me know if that resolves it.

sage


On Sun, 4 Aug 2013, Yu Changyuan wrote:

> 
> 
> 
> On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil  wrote:
>   On Sat, 3 Aug 2013, Yu Changyuan wrote:
>   > I run a tiny ceph cluster with only one monitor. After a
>   reboot the system,
>   > the monitor refuse to start.
>   > I try to start ceph-mon manually with command 'ceph -f -i a',
>    below is
>   > first few lines of the output:
>   >
>   > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
>   > /var/lib/ceph/mon/ceph-a fsid
>   554bee60-9602-4017-a6e1-ceb6907a218c
>   > mon/AuthMonitor.cc: In function 'virtual void
>   > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time
>   2013-08-03
>   > 20:27:29.208156
>   > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)
>   >
>   > The full log is at:
>   https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
> 
> This is 0.61.1.  Can you try again with 0.61.7 to rule out anything
> there?
> 
>  
> I just tried 0.61.7, still out of luck. Here is the log: 
> https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243
> 
> 
>   > So, are there any way to make the monitor work again?
>   >
>   > I have a backup of /var/lib/ceph/mon/ceph-a  in 2013-08-01,
>   and success
>   > start the monitor with these files,
>   > but rados and other command not work because osd keep saying
>   the monitor is
>   > the wrong node(that's right, it's actually the node 2 days
>   ago).
> 
> In general that is not going to work well as the cluster does not like
> to
> warp back in time.  If it does not start with .7 (I suspect it won't),
> can
> you send us a tarball of the mon data directory so we can see what is
> awry? 
> 
>  
> OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly.
>  
> 
>   sage
> 
> 
> 
> 
> --
> Best regards,
> Changyuan
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About single monitor recovery

2013-08-03 Thread Yu Changyuan
On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil  wrote:

> It looks like the auth state wasn't trimmed properly.  It also sort of
> looks like you aren't using authentication on this cluster... is that
> true?  (The keyring file was empty.)
>
> Yes, your're right, I disable auth. It's just a personal cluster, so the
simpler the better.

This looks like a trim issue, but I don't remember what all we fixed since
> .1.. that was a while ago!  We certainly haven't seen anything like this
> recently.
>
> I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing
> incrementals and will get your mon up, but you may lose some auth keys.
> If auth is on, you'll need ot add them back again.  If not, it may just
> work with this.
>
> You can grab the packages from
>
>
> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish
>
> or whatever the right dir is for your distro when they appear in about 15
> minutes.  Let me know if that resolves it.
>

Thank you for your work, I will try as soon as possible.
PS: My distro is Gentoo, so maybe I should build from source directly.


>
> sage
>
>
> On Sun, 4 Aug 2013, Yu Changyuan wrote:
>
> >
> >
> >
> > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil  wrote:
> >   On Sat, 3 Aug 2013, Yu Changyuan wrote:
> >   > I run a tiny ceph cluster with only one monitor. After a
> >   reboot the system,
> >   > the monitor refuse to start.
> >   > I try to start ceph-mon manually with command 'ceph -f -i a',
> >below is
> >   > first few lines of the output:
> >   >
> >   > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
> >   > /var/lib/ceph/mon/ceph-a fsid
> >   554bee60-9602-4017-a6e1-ceb6907a218c
> >   > mon/AuthMonitor.cc: In function 'virtual void
> >   > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time
> >   2013-08-03
> >   > 20:27:29.208156
> >   > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)
> >   >
> >   > The full log is at:
> >   https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
> >
> > This is 0.61.1.  Can you try again with 0.61.7 to rule out anything
> > there?
> >
> >
> > I just tried 0.61.7, still out of luck. Here is the log:
> > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243
> >
> >
> >   > So, are there any way to make the monitor work again?
> >   >
> >   > I have a backup of /var/lib/ceph/mon/ceph-a  in 2013-08-01,
> >   and success
> >   > start the monitor with these files,
> >   > but rados and other command not work because osd keep saying
> >   the monitor is
> >   > the wrong node(that's right, it's actually the node 2 days
> >   ago).
> >
> > In general that is not going to work well as the cluster does not like
> > to
> > warp back in time.  If it does not start with .7 (I suspect it won't),
> > can
> > you send us a tarball of the mon data directory so we can see what is
> > awry?
> >
> >
> > OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly.
> >
> >
> >   sage
> >
> >
> >
> >
> > --
> > Best regards,
> > Changyuan
> >
> >
>



-- 
Best regards,
Changyuan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com