Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-23 Thread Sage Weil
On Thu, 23 Jan 2014, Guang wrote: > Hi Joao, > Thanks for your reply! > > I captured the log after seeing the 'noin' keyword and the log is attached. > > Meanwhile, while checking the monitor logs, I see it does election every few > seconds and the election process could take several seconds, so

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-22 Thread Joao Eduardo Luis
On 01/22/2014 11:34 AM, Guang wrote: Thanks Sage. If we use the debug_mon and debug_paxos as 20, the log file is growing too fast, I set the log level as 10 and then: 1) run the 'ceph osd set noin' command, 2) grep the log with keyword 'noin', attached is the monitor log. Please help to check

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-20 Thread Sage Weil
On Sun, 19 Jan 2014, Guang wrote: > Thanks Sage. > > I just captured part of the log (it was fast growing), the process did > not hang but I saw the same pattern repeatedly. Should I increase the > log level and send over email (it constantly reproduced)? Sure! A representative fragment of the

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-19 Thread Guang
Thanks Sage. I just captured part of the log (it was fast growing), the process did not hang but I saw the same pattern repeatedly. Should I increase the log level and send over email (it constantly reproduced)? Thanks, Guang On Jan 18, 2014, at 12:05 AM, Sage Weil wrote: > On Fri, 17 Jan 20

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-18 Thread Sherry Shahbazi
Hi Guang, Can you check the privileges of ceph.conf and ceph.client.admin.keyring as they should look like the following: -rw-r--r-- 1 root root 719 Jan 17 17:34 ceph.conf -rw-r--r-- 1 root root 64 Jan 17 11:58 ceph.client.admin.keyring Regards Sherry On Wednesday, January 15, 2014 1:57 AM,

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-17 Thread Sage Weil
On Fri, 17 Jan 2014, Guang wrote: > Thanks Sage. > > I further narrow down the problem to #any command using paxos service would > hang#, following are details: > > 1. I am able to run ceph status / osd dump, etc., however, the result are out > of date (though I stopped all OSDs, it does not re

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-16 Thread Sage Weil
Hi Guang, On Thu, 16 Jan 2014, Guang wrote: > I still have bad the luck to figure out what is the problem making > authentication failure, so in order to get the cluster back, I tried: > 1. stop all daemons (mon & osd) > 2. change the configuration to disable cephx > 3. start mon daemons (3

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-16 Thread Guang
I still have bad the luck to figure out what is the problem making authentication failure, so in order to get the cluster back, I tried: 1. stop all daemons (mon & osd) 2. change the configuration to disable cephx 3. start mon daemons (3 in total) 4. start osd daemon one by one After fini

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-14 Thread Guang
Thanks Sage. -bash-4.1$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.osd151.asok mon_status { "name": "osd151", "rank": 2, "state": "electing", "election_epoch": 85469, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 1,

Re: [ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-14 Thread Sage Weil
On Tue, 14 Jan 2014, GuangYang wrote: > Hi ceph-users and ceph-devel, > I came across an issue after restarting monitors of the cluster, that > authentication fails which prevents running any ceph command. > > After we did some maintenance work, I restart OSD, however, I found that the > OSD wou

[ceph-users] Ceph cluster is unreachable because of authentication failure

2014-01-14 Thread GuangYang
Hi ceph-users and ceph-devel, I came across an issue after restarting monitors of the cluster, that authentication fails which prevents running any ceph command. After we did some maintenance work, I restart OSD, however, I found that the OSD would not join the cluster automatically after being