Got it.
Are you sending logs to the central syslog servers via TCP (@@) or
UDP (@)?
I just realised that my test cluster sends logs via UDP to our usual
central syslog server (as our productions hosts normally do), but it is
also configured to send logs via TCP to a testing Logstash VM. My suspic
Oh, my problems weren't on Ceph nodes. I've seen this problem on non-Ceph
nodes. The symptoms you had of unexplained weirdness with services (in your
case, Ceph), and syslog lagging 10mins behind just reminded me of symptoms
I've seen before where the sending of syslog messages to a central syslog
In my case, everything else running on the host seems to be okay. I'm
wondering if the other problems you see aren't a side-effect of Ceph
services running slow?
What do you do to get around the problem when it happens? Disable syslog in
Ceph?
What version of Ceph and OS are you using?
On Wed, J
Agreed. When I first had these problems, random stuff would just not work.
SSH would take a while to log in, DNS server would process requests slow,
our Batch system would freeze and not run jobs. It's now one of my first
things to check when services are running weirdly.
My failsafe check is to d
The funny thing is that I just restarted the rsyslog daemon on the Ceph
hosts and I can now re-enable syslog for Ceph without any issues. It just
looks like the rsyslog service had a hiccup, possibly related to problem on
one of the central syslog servers, and this in turn prevent the monitors to
o
On 07/26/2016 06:27 PM, Sergio A. de Carvalho Jr. wrote:
(Just realised I originally replied to Sean directly, so reposting here
for posterity).
Bingo!
wow. This didn't even cross my mind. D:
Thanks for sharing.
I turned off syslog and the monitors quickly reached quorum and
everything see
(Just realised I originally replied to Sean directly, so reposting here for
posterity).
Bingo!
I turned off syslog and the monitors quickly reached quorum and everything
seems back to normal. Thanks so much, Sean.
Luckily this is a test cluster. I wonder how I could catch this in a
production cl
On 07/26/2016 12:13 PM, Sergio A. de Carvalho Jr. wrote:
I left the 4 nodes running overnight and they just crawled to their
knees... to the point that nothing has been written to the logs in the
last 11 hours. So I stopped all monitors this morning and started them
one by one again, but they're
I left the 4 nodes running overnight and they just crawled to their
knees... to the point that nothing has been written to the logs in the last
11 hours. So I stopped all monitors this morning and started them one by
one again, but they're are still being extremely slow. Here are their logs:
Awesome, thanks so much, Joao.
Here's the mon_status:
https://gist.github.com/anonymous/2b80a9a75d134d9e539dfbc81615c055
I'm still trying to collect the logs, but while doing that I noticed that
the log records are severely delayed compared to the system clock. For
example, watching the logs with
On 07/25/2016 05:55 PM, Sergio A. de Carvalho Jr. wrote:
I just forced an NTP updated on all hosts to be sure it's down to clock
skew. I also checked that hosts can reach all other hosts on port 6789.
I then stopped monitor 0 (60z0m02) and started monitor 1 (60zxl02), but
the 3 monitors left (1
I just forced an NTP updated on all hosts to be sure it's down to clock
skew. I also checked that hosts can reach all other hosts on port 6789.
I then stopped monitor 0 (60z0m02) and started monitor 1 (60zxl02), but the
3 monitors left (1 - 60zxl02, 2 - 610wl02, 4 - 615yl02) were still having
prob
On 07/25/2016 04:34 PM, Sergio A. de Carvalho Jr. wrote:
Thanks, Joao.
All monitors have the exact same mom map.
I suspect you're right that there might be some communication problem
though. I stopped monitor 1 (60zxl02), but the other 3 monitors still
failed to reach a quorum. I could see moni
Thanks, Joao.
All monitors have the exact same mom map.
I suspect you're right that there might be some communication problem
though. I stopped monitor 1 (60zxl02), but the other 3 monitors still
failed to reach a quorum. I could see monitor 0 was still declaring victory
but the others were alway
We're having problems to start the 5th host (some BIOS problem, possibly),
so I won't be able to recover its monitor any time soon.
I knew having an even number of monitors wasn't ideal, and that's why I
started 3 monitors first and waited until they reached quorum before
starting the 4th monitor.
On 07/25/2016 03:41 PM, Sergio A. de Carvalho Jr. wrote:
In the logs, there 2 monitors are constantly reporting that they won the
leader election:
60z0m02 (monitor 0):
2016-07-25 14:31:11.644335 7f8760af7700 0 log_channel(cluster) log
[INF] : mon.60z0m02@0 won leader election with quorum 0,2,4
On 07/25/2016 03:45 PM, Joshua M. Boniface wrote:
My understanding is that you need an odd number of monitors to reach quorum.
This seems to match what you're seeing: with 3, there is a definite leader, but
with 4, there isn't. Have you tried starting both the 4th and 5th
simultaneously and le
My understanding is that you need an odd number of monitors to reach quorum.
This seems to match what you're seeing: with 3, there is a definite leader, but
with 4, there isn't. Have you tried starting both the 4th and 5th
simultaneously and letting them both vote?
--
Joshua M. Boniface
Linux S
In the logs, there 2 monitors are constantly reporting that they won the
leader election:
60z0m02 (monitor 0):
2016-07-25 14:31:11.644335 7f8760af7700 0 log_channel(cluster) log [INF] :
mon.60z0m02@0 won leader election with quorum 0,2,4
2016-07-25 14:31:44.521552 7f8760af7700 1 mon.60z0m02@0(le
Hi,
I have a cluster of 5 hosts running Ceph 0.94.6 on CentOS 6.5. On each
host, there is 1 monitor and 13 OSDs. We had an issue with the network and
for some reason (which I still don't know why), the servers were restarted.
One host is still down, but the monitors on the 4 remaining servers are
Roy
To: "cameron.scr...@solnet.co.nz" , Jan
Schermer
Cc: "ceph-users@lists.ceph.com" ,
ceph-users
Date: 04/06/2015 11:13 a.m.
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off,
IPtables off,
uot; , Jan
Schermer
Cc: "ceph-users@lists.ceph.com" ,
ceph-users
Date: 04/06/2015 11:13 a.m.
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux
off, IPtables off, can see tcp traffic)
Hmm…Thanks for sharing this..
Any chance it depends on switch ?
Could you p
r
Cc: ceph-users@lists.ceph.com; ceph-users
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables
off, can see tcp traffic)
The interface MTU has to be 18 or more bytes lower than the switch MTU or it
just stops working. As far as I know the monitor communication is not b
5 02:58 a.m.
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux
off, IPtables off, can see tcp traffic)
The TCP_NODELAY issue was with kernel rbd *not* with OSD. Ceph messenger
code base is setting it by default.
BTW, I doubt TCP_NODELAY has anything to do with it.
Than
.scr...@solnet.co.nz
Cc: Somnath Roy; ceph-users@lists.ceph.com; ceph-users
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables
off, can see tcp traffic)
Interface and switch should have the same MTU and that should not cause any
issues (setting switch MTU higher is always
;cameron.scr...@solnet.co.nz"
> Cc:"ceph-users@lists.ceph.com" ,
> ceph-users , Joao Eduardo Luis
>
> Date:03/06/2015 11:49 a.m.
> Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off,
> IPtables off, can see tcp traffic)
>
ks & Regards
Somnath
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off,
IPtables off, can se
"ceph-users@lists.ceph.com" ,
ceph-users , Joao Eduardo Luis
Date: 03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux
off, IPtables off, can see tcp traffic)
I doubt it is anything to do with Ceph, hope you checked your switch is
supporting
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables
off, can see tcp traffic)
Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately
03/06/2015 10:34 a.m.
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux
off, IPtables off, can see tcp traffic)
We have seen some communication issue with that, try to make all the
server MTU 1500 and try out…
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.c
On Wed, Jun 3, 2015 at 8:30 AM, wrote:
> We are running with Jumbo Frames turned on. Is that likely to be the issue?
I got caught by this previously:
http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html
The problem is Ceph "almost-but-not-quite" works, leading you
: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables
off, can see tcp traffic)
We are running with Jumbo Frames turned on. Is that likely to be the issue? Do
I need to configure something in ceph?
The mon maps are fine and after setting debug to 10 and debug ms to 1, I see
probe
Roy
To: Joao Eduardo Luis , "ceph-users@lists.ceph.com"
Date: 03/06/2015 03:49 a.m.
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux
off, IPtables off, can see tcp traffic)
Sent by:"ceph-users"
By any chance are you running with j
sers] Monitors not reaching quorum. (SELinux off, IPtables
off, can see tcp traffic)
On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
> I am trying to deploy a new ceph cluster and my monitors are not
> reaching quorum. SELinux is off, firewalls are off, I can see traffic
> betw
On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
> I am trying to deploy a new ceph cluster and my monitors are not
> reaching quorum. SELinux is off, firewalls are off, I can see traffic
> between the nodes on port 6789 but when I use the admin socket to force
> a re-election only the mo
I am trying to deploy a new ceph cluster and my monitors are not reaching
quorum. SELinux is off, firewalls are off, I can see traffic between the
nodes on port 6789 but when I use the admin socket to force a re-election
only the monitor I send the request to shows the new election in its logs.
36 matches
Mail list logo