And here is the log of ceph-mon, with debug_mon set to 10, I run "ceph -s" command(which is blocked) on 192.168.1.2 during recording this log.
https://gist.github.com/yuchangyuan/ba3e72452215221d1e82 On Sun, Aug 4, 2013 at 3:25 PM, Yu Changyuan <rei...@gmail.com> wrote: > I just try the branch, and mon start ok, here is the log: > https://gist.github.com/yuchangyuan/3138952ac60508d18aed > But ceph -s or ceph -w just block, without any message return(I just start > monitor, no mds or osd). > > > > On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <rei...@gmail.com> wrote: > >> >> On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil <s...@inktank.com> wrote: >> >>> It looks like the auth state wasn't trimmed properly. It also sort of >>> looks like you aren't using authentication on this cluster... is that >>> true? (The keyring file was empty.) >>> >>> Yes, your're right, I disable auth. It's just a personal cluster, so the >> simpler the better. >> >> This looks like a trim issue, but I don't remember what all we fixed since >>> .1.. that was a while ago! We certainly haven't seen anything like this >>> recently. >>> >>> I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing >>> incrementals and will get your mon up, but you may lose some auth keys. >>> If auth is on, you'll need ot add them back again. If not, it may just >>> work with this. >>> >>> You can grab the packages from >>> >>> >>> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish >>> >>> or whatever the right dir is for your distro when they appear in about 15 >>> minutes. Let me know if that resolves it. >>> >> >> Thank you for your work, I will try as soon as possible. >> PS: My distro is Gentoo, so maybe I should build from source directly. >> >> >>> >>> sage >>> >>> >>> On Sun, 4 Aug 2013, Yu Changyuan wrote: >>> >>> > >>> > >>> > >>> > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil <s...@inktank.com> wrote: >>> > On Sat, 3 Aug 2013, Yu Changyuan wrote: >>> > > I run a tiny ceph cluster with only one monitor. After a >>> > reboot the system, >>> > > the monitor refuse to start. >>> > > I try to start ceph-mon manually with command 'ceph -f -i a', >>> > below is >>> > > first few lines of the output: >>> > > >>> > > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data >>> > > /var/lib/ceph/mon/ceph-a fsid >>> > 554bee60-9602-4017-a6e1-ceb6907a218c >>> > > mon/AuthMonitor.cc: In function 'virtual void >>> > > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time >>> > 2013-08-03 >>> > > 20:27:29.208156 >>> > > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0) >>> > > >>> > > The full log is at: >>> > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 >>> > >>> > This is 0.61.1. Can you try again with 0.61.7 to rule out anything >>> > there? >>> > >>> > >>> > I just tried 0.61.7, still out of luck. Here is the log: >>> > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243 >>> > >>> > >>> > > So, are there any way to make the monitor work again? >>> > > >>> > > I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01, >>> > and success >>> > > start the monitor with these files, >>> > > but rados and other command not work because osd keep saying >>> > the monitor is >>> > > the wrong node(that's right, it's actually the node 2 days >>> > ago). >>> > >>> > In general that is not going to work well as the cluster does not like >>> > to >>> > warp back in time. If it does not start with .7 (I suspect it won't), >>> > can >>> > you send us a tarball of the mon data directory so we can see what is >>> > awry? >>> > >>> > >>> > OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you >>> directly. >>> > >>> > >>> > sage >>> > >>> > >>> > >>> > >>> > -- >>> > Best regards, >>> > Changyuan >>> > >>> > >>> >> >> >> >> -- >> Best regards, >> Changyuan >> > > > > -- > Best regards, > Changyuan > -- Best regards, Changyuan
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com