On Sun, Aug 25, 2013 at 10:27 PM, Joao Eduardo Luis <joao.l...@inktank.com>wrote:
> On 08/25/2013 12:36 PM, Yu Changyuan wrote: > >> Today, when I restart ceph service, the problem I asked on mail-list >> before happened >> again(http://article.gmane.**org/gmane.comp.file-systems.**ceph.user/2995<http://article.gmane.org/gmane.comp.file-systems.ceph.user/2995> >> ), >> ceph-mon refuse to start and report below error: >> >> 2013-08-25 18:24:52.465600 7fb50a496780 -1 mon/AuthMonitor.cc: In >> function 'virtual void AuthMonitor::update_from_**paxos(bool*)' thread >> 7fb50a496780 time 2013-08-25 18:24:52.453920 >> mon/AuthMonitor.cc: 152: FAILED assert(ret == 0) >> >> ceph version 0.61.7 (**8f010aff684e820ecc837c25ac77c7**a05d7191ff) >> 1: (AuthMonitor::update_from_**paxos(bool*)+0x1fee) [0x57742e] >> 2: (PaxosService::refresh(bool*)+**0x18d) [0x4f630d] >> 3: (Monitor::refresh_from_paxos(**bool*)+0x57) [0x496477] >> 4: (Monitor::init_paxos()+0xf5) [0x496635] >> 5: (Monitor::preinit()+0x6bc) [0x4ad1dc] >> 6: (main()+0x1bec) [0x48ac8c] >> 7: (__libc_start_main()+0xed) [0x7fb5084c660d] >> 8: ceph-mon() [0x48dab9] >> >> Then, I switch to ''wip-mon-skip-auth-**cuttlefish" branch, ceph-mon >> complain some "missing auth inc"(from 1 to 500), and continue running, >> then everything is ok again. >> >> But when I stop this patched ceph-mon, and try to start regular >> unpatched ceph-mon, above error happened again. As I mentioned, the >> ceph-mon files last time I use is not the final one that 'missing auth', >> but the files 2 days before ceph-mon fail, which actually ceph-mon start >> ok but ceph-osd refuse to work. >> >> So, I want to know how to make these ceph-mon files that only work with >> patched ceph-mon to work again withexcept OSError, e regular unpatched >> ceph-mon. >> >> > Changyuan, > > Would you mind sending us your monitor store? If you have other monitors, > specially if this doesn't happen on them, the other monitor's stores would > also be insightful. OK, I have sent the monitor's store to you. > Furthermore, what's your cluster history? At what version was it first > deployed, and what versions have you upgraded it to until reaching 0.61.7? > This is the full history of my cluster: 1. My cluster first deploy on version 0.61.1 2. and when ceph-mon refuse to start after a reboot, I directly upgrade to 0.61.7, and make the cluster work again with patched ceph-mon and monitor's store 2 days before ceph-mon not work. 3. then I stop restart cluster with regular ceph-mon(and works). 4. I restart cluster cluster and find ceph-mon not start again 3 days ago, so I try patched ceph-mon and it works, but this time I do not restart cluster with a regular ceph-mon. 5. then I try to add another monitor(mon.b) yesterday, after mon.b join the cluster, the ceph-mon which is unpatched running on the new host throw the same exception from "AuthMonitor::update_from_paxos", and stopped. 6. I have to stop cluster and manually remove the never start again mon.b from cluster(I don't have patched version on new host), and make the cluster running a single mon.a with patched ceph-mon again. -Joao -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com ______________________________**_________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> -- Best regards, Changyuan
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com