Re: [ceph-users] Monitors not reaching quorum

2016-07-27 Thread Sergio A. de Carvalho Jr.
Got it. Are you sending logs to the central syslog servers via TCP (@@) or UDP (@)? I just realised that my test cluster sends logs via UDP to our usual central syslog server (as our productions hosts normally do), but it is also configured to send logs via TCP to a testing Logstash VM. My suspic

Re: [ceph-users] Monitors not reaching quorum

2016-07-27 Thread Sean Crosby
Oh, my problems weren't on Ceph nodes. I've seen this problem on non-Ceph nodes. The symptoms you had of unexplained weirdness with services (in your case, Ceph), and syslog lagging 10mins behind just reminded me of symptoms I've seen before where the sending of syslog messages to a central syslog

Re: [ceph-users] Monitors not reaching quorum

2016-07-27 Thread Sergio A. de Carvalho Jr.
In my case, everything else running on the host seems to be okay. I'm wondering if the other problems you see aren't a side-effect of Ceph services running slow? What do you do to get around the problem when it happens? Disable syslog in Ceph? What version of Ceph and OS are you using? On Wed, J

Re: [ceph-users] Monitors not reaching quorum

2016-07-26 Thread Sean Crosby
Agreed. When I first had these problems, random stuff would just not work. SSH would take a while to log in, DNS server would process requests slow, our Batch system would freeze and not run jobs. It's now one of my first things to check when services are running weirdly. My failsafe check is to d

Re: [ceph-users] Monitors not reaching quorum

2016-07-26 Thread Sergio A. de Carvalho Jr.
The funny thing is that I just restarted the rsyslog daemon on the Ceph hosts and I can now re-enable syslog for Ceph without any issues. It just looks like the rsyslog service had a hiccup, possibly related to problem on one of the central syslog servers, and this in turn prevent the monitors to o

Re: [ceph-users] Monitors not reaching quorum

2016-07-26 Thread Joao Eduardo Luis
On 07/26/2016 06:27 PM, Sergio A. de Carvalho Jr. wrote: (Just realised I originally replied to Sean directly, so reposting here for posterity). Bingo! wow. This didn't even cross my mind. D: Thanks for sharing. I turned off syslog and the monitors quickly reached quorum and everything see

Re: [ceph-users] Monitors not reaching quorum

2016-07-26 Thread Sergio A. de Carvalho Jr.
(Just realised I originally replied to Sean directly, so reposting here for posterity). Bingo! I turned off syslog and the monitors quickly reached quorum and everything seems back to normal. Thanks so much, Sean. Luckily this is a test cluster. I wonder how I could catch this in a production cl

Re: [ceph-users] Monitors not reaching quorum

2016-07-26 Thread Joao Eduardo Luis
On 07/26/2016 12:13 PM, Sergio A. de Carvalho Jr. wrote: I left the 4 nodes running overnight and they just crawled to their knees... to the point that nothing has been written to the logs in the last 11 hours. So I stopped all monitors this morning and started them one by one again, but they're

Re: [ceph-users] Monitors not reaching quorum

2016-07-26 Thread Sergio A. de Carvalho Jr.
I left the 4 nodes running overnight and they just crawled to their knees... to the point that nothing has been written to the logs in the last 11 hours. So I stopped all monitors this morning and started them one by one again, but they're are still being extremely slow. Here are their logs:

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Sergio A. de Carvalho Jr.
Awesome, thanks so much, Joao. Here's the mon_status: https://gist.github.com/anonymous/2b80a9a75d134d9e539dfbc81615c055 I'm still trying to collect the logs, but while doing that I noticed that the log records are severely delayed compared to the system clock. For example, watching the logs with

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Joao Eduardo Luis
On 07/25/2016 05:55 PM, Sergio A. de Carvalho Jr. wrote: I just forced an NTP updated on all hosts to be sure it's down to clock skew. I also checked that hosts can reach all other hosts on port 6789. I then stopped monitor 0 (60z0m02) and started monitor 1 (60zxl02), but the 3 monitors left (1

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Sergio A. de Carvalho Jr.
I just forced an NTP updated on all hosts to be sure it's down to clock skew. I also checked that hosts can reach all other hosts on port 6789. I then stopped monitor 0 (60z0m02) and started monitor 1 (60zxl02), but the 3 monitors left (1 - 60zxl02, 2 - 610wl02, 4 - 615yl02) were still having prob

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Joao Eduardo Luis
On 07/25/2016 04:34 PM, Sergio A. de Carvalho Jr. wrote: Thanks, Joao. All monitors have the exact same mom map. I suspect you're right that there might be some communication problem though. I stopped monitor 1 (60zxl02), but the other 3 monitors still failed to reach a quorum. I could see moni

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Sergio A. de Carvalho Jr.
Thanks, Joao. All monitors have the exact same mom map. I suspect you're right that there might be some communication problem though. I stopped monitor 1 (60zxl02), but the other 3 monitors still failed to reach a quorum. I could see monitor 0 was still declaring victory but the others were alway

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Sergio A. de Carvalho Jr.
We're having problems to start the 5th host (some BIOS problem, possibly), so I won't be able to recover its monitor any time soon. I knew having an even number of monitors wasn't ideal, and that's why I started 3 monitors first and waited until they reached quorum before starting the 4th monitor.

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Joao Eduardo Luis
On 07/25/2016 03:41 PM, Sergio A. de Carvalho Jr. wrote: In the logs, there 2 monitors are constantly reporting that they won the leader election: 60z0m02 (monitor 0): 2016-07-25 14:31:11.644335 7f8760af7700 0 log_channel(cluster) log [INF] : mon.60z0m02@0 won leader election with quorum 0,2,4

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Joao Eduardo Luis
On 07/25/2016 03:45 PM, Joshua M. Boniface wrote: My understanding is that you need an odd number of monitors to reach quorum. This seems to match what you're seeing: with 3, there is a definite leader, but with 4, there isn't. Have you tried starting both the 4th and 5th simultaneously and le

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Joshua M. Boniface
My understanding is that you need an odd number of monitors to reach quorum. This seems to match what you're seeing: with 3, there is a definite leader, but with 4, there isn't. Have you tried starting both the 4th and 5th simultaneously and letting them both vote? -- Joshua M. Boniface Linux S

Re: [ceph-users] Monitors not reaching quorum

2016-07-25 Thread Sergio A. de Carvalho Jr.
In the logs, there 2 monitors are constantly reporting that they won the leader election: 60z0m02 (monitor 0): 2016-07-25 14:31:11.644335 7f8760af7700 0 log_channel(cluster) log [INF] : mon.60z0m02@0 won leader election with quorum 0,2,4 2016-07-25 14:31:44.521552 7f8760af7700 1 mon.60z0m02@0(le

[ceph-users] Monitors not reaching quorum

2016-07-25 Thread Sergio A. de Carvalho Jr.
Hi, I have a cluster of 5 hosts running Ceph 0.94.6 on CentOS 6.5. On each host, there is 1 monitor and 13 OSDs. We had an issue with the network and for some reason (which I still don't know why), the servers were restarted. One host is still down, but the monitors on the 4 remaining servers are

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Alex Moore
Roy To: "cameron.scr...@solnet.co.nz" , Jan Schermer Cc: "ceph-users@lists.ceph.com" , ceph-users Date: 04/06/2015 11:13 a.m. Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off,

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Cameron . Scrace
uot; , Jan Schermer Cc: "ceph-users@lists.ceph.com" , ceph-users Date: 04/06/2015 11:13 a.m. Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Hmm…Thanks for sharing this.. Any chance it depends on switch ? Could you p

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Somnath Roy
r Cc: ceph-users@lists.ceph.com; ceph-users Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) The interface MTU has to be 18 or more bytes lower than the switch MTU or it just stops working. As far as I know the monitor communication is not b

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Cameron . Scrace
5 02:58 a.m. Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) The TCP_NODELAY issue was with kernel rbd *not* with OSD. Ceph messenger code base is setting it by default. BTW, I doubt TCP_NODELAY has anything to do with it. Than

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Somnath Roy
.scr...@solnet.co.nz Cc: Somnath Roy; ceph-users@lists.ceph.com; ceph-users Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Interface and switch should have the same MTU and that should not cause any issues (setting switch MTU higher is always

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Jan Schermer
;cameron.scr...@solnet.co.nz" > Cc:"ceph-users@lists.ceph.com" , > ceph-users , Joao Eduardo Luis > > Date:03/06/2015 11:49 a.m. > Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, > IPtables off, can see tcp traffic) >

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
ks & Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can se

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
"ceph-users@lists.ceph.com" , ceph-users , Joao Eduardo Luis Date: 03/06/2015 11:49 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) I doubt it is anything to do with Ceph, hope you checked your switch is supporting

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
03/06/2015 10:34 a.m. Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.c

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Nigel Williams
On Wed, Jun 3, 2015 at 8:30 AM, wrote: > We are running with Jumbo Frames turned on. Is that likely to be the issue? I got caught by this previously: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html The problem is Ceph "almost-but-not-quite" works, leading you

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Roy To: Joao Eduardo Luis , "ceph-users@lists.ceph.com" Date: 03/06/2015 03:49 a.m. Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:"ceph-users" By any chance are you running with j

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
sers] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: > I am trying to deploy a new ceph cluster and my monitors are not > reaching quorum. SELinux is off, firewalls are off, I can see traffic > betw

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Joao Eduardo Luis
On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: > I am trying to deploy a new ceph cluster and my monitors are not > reaching quorum. SELinux is off, firewalls are off, I can see traffic > between the nodes on port 6789 but when I use the admin socket to force > a re-election only the mo

[ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-01 Thread Cameron . Scrace
I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs.