On Tue, Aug 6, 2013 at 12:28 AM, Sage Weil <s...@inktank.com> wrote: > On Mon, 5 Aug 2013, Yu Changyuan wrote: > > The good news is, with new patch, ceph start OK, cephfs mount OK, and kvm > > virtual machine use rbd boot OK(and seems running ok), and I check the > > timestamp of last file write to cephfs, it's fair near to the time of > > reboot(which cause ceph not work any more). Since I don't have any other > way > > to check the integrity of the files store in cephfs, I just randomly > pick > > some video files, and play it, all seems OK. > > > > So, thank you very much. > > > > But, I do not use the last version of files in /var/lib/ceph/mon/ceph-a, > > with these files, ceph-mon startup ok, and ceph -s returns, but osd still > > think the monitor is wrong node and refuse to work. > > Then I think I may try the files of 2 day ago(Aug 1st) and see what > happen, > > and something actually happen, that is ceph-osd start to work. > > So, I am a bit curious about why patched version work with the ceph-mon > data > > 2 days ago but original version not, > > and what more important, do I need extra step to make current running > ceph > > cluster to work with a normal version(not patched) ceph, > > and are there any chance that current cluster will run into problem in > the > > future(keep current state and do not take any extra step). > > I think you will be fine with the current state and switching back to > normal release code. >
That is to say, I can just stop current running ceph-{osd,mds,mon}, and then start normal release one(0.61.7)? > I'm confused why ceph-osds wouldn't start with the latest mon data, but > can't speculate too much without spending time analyzing your logs from > the failed startup. > I just clear logs before try old mon data(I do not predict the old mon data will work), and after osd starting ok, the status of osd are changed, so perhaps I can not provide enough log for such an analysis. And it maybe not worth to cost time to analyze the reason. After all, ceph back online again. > > Glad to hear you're back online! Thank you. sage > > > > > > > > > > On Mon, Aug 5, 2013 at 12:39 AM, Sage Weil <s...@inktank.com> wrote: > > On Sun, 4 Aug 2013, Yu Changyuan wrote: > > > And here is the log of ceph-mon, with debug_mon set to 10, I run > > "ceph -s" > > > command(which is blocked) on 192.168.1.2 during recording this log. > > > > > > https://gist.github.com/yuchangyuan/ba3e72452215221d1e82 > > > > I pushed one more patch to that branch that should get you up. This > > one > > should go to master as well. > > > > sage > > > > > > > > > > > On Sun, Aug 4, 2013 at 3:25 PM, Yu Changyuan <rei...@gmail.com> > > wrote: > > > I just try the branch, and mon start ok, here is the log: > > > https://gist.github.com/yuchangyuan/3138952ac60508d18aed > > > But ceph -s or ceph -w just block, without any message > > return(I > > > just start monitor, no mds or osd). > > > > > > > > > > > > On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <rei...@gmail.com> > > > wrote: > > > > > > On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil > > > <s...@inktank.com> wrote: > > > It looks like the auth state wasn't trimmed > > > properly. It also sort of > > > looks like you aren't using authentication on > > > this cluster... is that > > > true? (The keyring file was empty.) > > > > > > Yes, your're right, I disable auth. It's just a personal > > > cluster, so the simpler the better. > > > > > > This looks like a trim issue, but I don't remember > > > what all we fixed since > > > .1.. that was a while ago! We certainly haven't > > > seen anything like this > > > recently. > > > > > > I pushed a branch wip-mon-skip-auth-cuttlefish that > > > skips the missing > > > incrementals and will get your mon up, but you may > > > lose some auth keys. > > > If auth is on, you'll need ot add them back again. > > > If not, it may just > > > work with this. > > > > > > You can grab the packages from > > > > > > > http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip- > > > > > auth-cuttlefish > > > > > > or whatever the right dir is for your distro when > > > they appear in about 15 > > > minutes. Let me know if that resolves it. > > > > > > > > > Thank you for your work, I will try as soon as possible. > > > PS: My distro is Gentoo, so maybe I should build from source > > > directly. > > > > > > > > > sage > > > > > > > > > On Sun, 4 Aug 2013, Yu Changyuan wrote: > > > > > > > > > > > > > > > > > > > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil > > > <s...@inktank.com> wrote: > > > > On Sat, 3 Aug 2013, Yu Changyuan wrote: > > > > > I run a tiny ceph cluster with only one > > > monitor. After a > > > > reboot the system, > > > > > the monitor refuse to start. > > > > > I try to start ceph-mon manually with > > > command 'ceph -f -i a', > > > > below is > > > > > first few lines of the output: > > > > > > > > > > starting mon.a rank 0 at > > > 192.168.1.10:6789/0 mon_data > > > > > /var/lib/ceph/mon/ceph-a fsid > > > > 554bee60-9602-4017-a6e1-ceb6907a218c > > > > > mon/AuthMonitor.cc: In function 'virtual > > > void > > > > > AuthMonitor::update_from_paxos()' thread > > > 7f9e3b0db780 time > > > > 2013-08-03 > > > > > 20:27:29.208156 > > > > > mon/AuthMonitor.cc: 147: FAILED assert(ret > > > == 0) > > > > > > > > > > The full log is at: > > > > > > > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 > > > > > > > > This is 0.61.1. Can you try again with 0.61.7 to > > > rule out anything > > > > there? > > > > > > > > > > > > I just tried 0.61.7, still out of luck. Here is > > > the log: > > > > > > > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243 > > > > > > > > > > > > > So, are there any way to make the monitor > > > work again? > > > > > > > > > > I have a backup of > > > /var/lib/ceph/mon/ceph-a in 2013-08-01, > > > > and success > > > > > start the monitor with these files, > > > > > but rados and other command not work > > > because osd keep saying > > > > the monitor is > > > > > the wrong node(that's right, it's actually > > > the node 2 days > > > > ago). > > > > > > > > In general that is not going to work well as the > > > cluster does not like > > > > to > > > > warp back in time. If it does not start with .7 > > > (I suspect it won't), > > > > can > > > > you send us a tarball of the mon data directory so > > > we can see what is > > > > awry? > > > > > > > > > > > > OK, I will send the tarball of > > > /var/lib/ceph/mon/ceph-a to you directly. > > > > > > > > > > > > sage > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Changyuan > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Changyuan > > > > > > > > > > > > > > > -- > > > Best regards, > > > Changyuan > > > > > > > > > > > > > > > -- > > > Best regards, > > > Changyuan > > > > > > > > > > > > > > > > -- > > Best regards, > > Changyuan > > > > > -- Best regards, Changyuan
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com