Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Stephen Street
Joao, On May 13, 2013, at 3:24 PM, Stephen Street wrote: > > From the logs, it appears that the monitors are struggling to bind to the > network at system start. If I issue a initctl restart ceph-mon-all to all > nodes running monitors, the system starts functioning correctly. > I found the

Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Stephen Street
Joao, Thanks for you response. Sorry for the marginal quality of the original e-mail.. Better log information in-line. On May 13, 2013, at 1:19 PM, Joao Eduardo Luis wrote: > On 05/13/2013 08:40 PM, Stephen Street wrote: >> >> On May 10, 2013, at 3:39 PM, Joao Eduardo Luis wrote: >>

Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Joao Eduardo Luis
On 05/13/2013 08:40 PM, Stephen Street wrote: On May 10, 2013, at 3:39 PM, Joao Eduardo Luis wrote: We would certainly be interested in taking a look at logs from those monitors, and would appreciate if you could set 'debug mon = 20', 'debug auth = 10' and 'debug ms = 1', and give them a spi

Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Stephen Street
On May 10, 2013, at 3:39 PM, Joao Eduardo Luis wrote: > We would certainly be interested in taking a look at logs from those > monitors, and would appreciate if you could set 'debug mon = 20', 'debug auth > = 10' and 'debug ms = 1', and give them a spin until you hit your issue. > I seeing t

Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-10 Thread Jeppesen, Nelson
Thank you, you saved my bacon. I didn't inject the new map properly, the monitor is going nuts but it's recovering. I wonder if I was hit by the .61 race condition. How can I verify that the monitor has upgraded to the 'new' .61 style that uses a single paxos? Thanks. Nelson Jeppesen _

Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-10 Thread Joao Eduardo Luis
On 05/10/2013 11:02 PM, Jeppesen, Nelson wrote: After upgrading my cluster everything looked good, then I rebooted the farm and all hell broke loose. I have 3 monitors but none are able to start. On all of them the '/usr/bin/python /usr/sbin/ceph-create-keys' command is hanging because none of

[ceph-users] Help! 61.1 killed my monitors in prod

2013-05-10 Thread Jeppesen, Nelson
After upgrading my cluster everything looked good, then I rebooted the farm and all hell broke loose. I have 3 monitors but none are able to start. On all of them the '/usr/bin/python /usr/sbin/ceph-create-keys' command is hanging because none of the nodes can accept quorum. All ceph tools a