On Sun, Aug 25, 2013 at 10:27 PM, Joao Eduardo Luis
<joao.l...@inktank.com>wrote:

> On 08/25/2013 12:36 PM, Yu Changyuan wrote:
>
>> Today, when I restart ceph service, the problem I asked on mail-list
>> before happened
>> again(http://article.gmane.**org/gmane.comp.file-systems.**ceph.user/2995<http://article.gmane.org/gmane.comp.file-systems.ceph.user/2995>
>> ),
>> ceph-mon refuse to start and report below error:
>>
>> 2013-08-25 18:24:52.465600 7fb50a496780 -1 mon/AuthMonitor.cc: In
>> function 'virtual void AuthMonitor::update_from_**paxos(bool*)' thread
>> 7fb50a496780 time 2013-08-25 18:24:52.453920
>> mon/AuthMonitor.cc: 152: FAILED assert(ret == 0)
>>
>>   ceph version 0.61.7 (**8f010aff684e820ecc837c25ac77c7**a05d7191ff)
>>   1: (AuthMonitor::update_from_**paxos(bool*)+0x1fee) [0x57742e]
>>   2: (PaxosService::refresh(bool*)+**0x18d) [0x4f630d]
>>   3: (Monitor::refresh_from_paxos(**bool*)+0x57) [0x496477]
>>   4: (Monitor::init_paxos()+0xf5) [0x496635]
>>   5: (Monitor::preinit()+0x6bc) [0x4ad1dc]
>>   6: (main()+0x1bec) [0x48ac8c]
>>   7: (__libc_start_main()+0xed) [0x7fb5084c660d]
>>   8: ceph-mon() [0x48dab9]
>>
>> Then, I switch to ''wip-mon-skip-auth-**cuttlefish" branch, ceph-mon
>> complain some "missing auth inc"(from 1 to 500), and continue running,
>> then everything is ok again.
>>
>> But when I stop this patched ceph-mon, and try to start regular
>> unpatched ceph-mon, above error happened again. As I mentioned, the
>> ceph-mon files last time I use is not the final one that 'missing auth',
>> but the files 2 days before ceph-mon fail, which actually ceph-mon start
>> ok but ceph-osd refuse to work.
>>
>> So, I want to know how to make these ceph-mon files that only work with
>> patched ceph-mon to work again withexcept OSError, e regular unpatched
>> ceph-mon.
>>
>>
> Changyuan,
>
> Would you mind sending us your monitor store?  If you have other monitors,
> specially if this doesn't happen on them, the other monitor's stores would
> also be insightful.

OK, I have sent the monitor's store to you.

> Furthermore, what's your cluster history?  At what version was it first
> deployed, and what versions have you upgraded it to until reaching 0.61.7?
>
This is the full history of my cluster:
1. My cluster first deploy on version 0.61.1
2. and when ceph-mon refuse to start after a reboot, I directly upgrade to
0.61.7, and make the cluster work again with patched ceph-mon and monitor's
store 2 days before ceph-mon not work.
3. then I stop restart cluster with regular ceph-mon(and works).
4. I restart cluster cluster and find ceph-mon not start again 3 days ago,
so I try patched ceph-mon and it works, but this time I do not  restart
cluster with a regular ceph-mon.
5. then I try to add another monitor(mon.b) yesterday, after mon.b join the
cluster, the ceph-mon which is unpatched running on the new host throw the
same exception from "AuthMonitor::update_from_paxos", and stopped.
6. I have to stop cluster and manually remove the never start again mon.b
from cluster(I don't have patched version on new host), and make the
cluster running a single mon.a with patched ceph-mon again.

  -Joao

-- 
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
______________________________**_________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



-- 
Best regards,
Changyuan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to