Honza and Angus,
Glad to hear about this possible breakthrough! Here's the output of df:
root@storage1:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg00-lv_root 228424996 3376236 213445408 2% /
udev 3041428 4 3041424 1% /dev
tmpfs 1220808 340 1220468 1% /run
none 5120 8
Andrew,
good news. I believe that I've found reproducer for problem you are
facing. Now, to be sure it's really same, can you please run :
df (interesting is /dev/shm)
and send output of ls -la /dev/shm?
I believe /dev/shm is full.
Now, as a quick workaround, just delete all qb-* from /dev/shm an
Andrew,
thanks for valgrind report (even it didn't showed anything useful) and
blackbox.
We believe that problem is because of access to invalid memory mapped by
mmap operation. There are basically 3 places where we are doing mmap.
1.) corosync cpg_zcb functions (I don't believe this is the case)
Angus and Honza,
I recompiled corosync with --enable-debug. Below is a capture of the valgrind
output when corosync dies, after switching rrp_mode to passive:
# valgrind corosync -f
==5453== Memcheck, a memory error detector
==5453== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et a
Andrew,
Andrew Martin napsal(a):
> A bit more data on this problem: I was doing some maintenance and had to
> briefly disconnect storagequorum's connection to the STONITH network
> (ethernet cable #7 in this diagram):
> http://sources.xes-inc.com/downloads/storagecluster.png
>
>
> Since coro
Andrew,
Andrew Martin napsal(a):
> Hi Angus,
>
>
> I recompiled corosync with the changes you suggested in exec/main.c to
> generate fdata when SIGBUS is triggered. Here 's the corresponding coredump
> and fdata files:
> http://sources.xes-inc.com/downloads/core.13027
> http://sources.xes-i
On 06/11/12 17:47 -0600, Andrew Martin wrote:
A bit more data on this problem: I was doing some maintenance and had to
briefly disconnect storagequorum's connection to the STONITH network (ethernet
cable #7 in this diagram):
http://sources.xes-inc.com/downloads/storagecluster.png
Since corosy
A bit more data on this problem: I was doing some maintenance and had to
briefly disconnect storagequorum's connection to the STONITH network (ethernet
cable #7 in this diagram):
http://sources.xes-inc.com/downloads/storagecluster.png
Since corosync has two rings (and is in active mode), this s
Hi Angus,
I recompiled corosync with the changes you suggested in exec/main.c to generate
fdata when SIGBUS is triggered. Here 's the corresponding coredump and fdata
files:
http://sources.xes-inc.com/downloads/core.13027
http://sources.xes-inc.com/downloads/fdata.20121106
(gdb) thread apply
jfrie...@redhat.com >
To: pacemaker@oss.clusterlabs.org, disc...@corosync.org
Sent: Monday, November 5, 2012 2:21:09 AM
Subject: Re: [Pacemaker] [corosync] Corosync 2.1.0 dies on both nodes in
cluster
Angus Salkeld napsal(a):
> On 02/11/12 13:07 -0500, Andrew Martin wrote:
>>
Angus Salkeld napsal(a):
> On 02/11/12 13:07 -0500, Andrew Martin wrote:
>> Hi Angus,
>>
>>
>> Corosync died again while using libqb 0.14.3. Here is the coredump
>> from today:
>> http://sources.xes-inc.com/downloads/corosync.nov2.coredump
>>
>>
>>
>> # corosync -f
>> notice [MAIN ] Corosync Cluste
On 02/11/12 13:07 -0500, Andrew Martin wrote:
Hi Angus,
Corosync died again while using libqb 0.14.3. Here is the coredump from today:
http://sources.xes-inc.com/downloads/corosync.nov2.coredump
# corosync -f
notice [MAIN ] Corosync Cluster Engine ('2.1.0'): started and ready to provide
ser
Hi Angus,
Corosync died again while using libqb 0.14.3. Here is the coredump from today:
http://sources.xes-inc.com/downloads/corosync.nov2.coredump
# corosync -f
notice [MAIN ] Corosync Cluster Engine ('2.1.0'): started and ready to provide
service.
info [MAIN ] Corosync built-in features: p
On 01/11/12 17:27 -0500, Andrew Martin wrote:
Hi Angus,
I'll try upgrading to the latest libqb tomorrow and see if I can reproduce this
behavior with it. I was able to get a coredump by running corosync manually in
the foreground (corosync -f):
http://sources.xes-inc.com/downloads/corosync.co
Hi Angus,
I'll try upgrading to the latest libqb tomorrow and see if I can reproduce this
behavior with it. I was able to get a coredump by running corosync manually in
the foreground (corosync -f):
http://sources.xes-inc.com/downloads/corosync.coredump
There still isn't anything added to /va
On 01/11/12 14:32 -0500, Andrew Martin wrote:
Hi Honza,
Thanks for the help. I enabled core dumps in /etc/security/limits.conf but
didn't have a chance to reboot and apply the changes so I don't have a core
dump this time. Do core dumps need to be enabled for the fdata-DATETIME-PID
file to b
Hi Honza,
Thanks for the help. I enabled core dumps in /etc/security/limits.conf but
didn't have a chance to reboot and apply the changes so I don't have a core
dump this time. Do core dumps need to be enabled for the fdata-DATETIME-PID
file to be generated? right now all that is in /var/lib/c
Ansdrew,
I was not able to find anything interesting (from corosync point of
view) in configuration/logs (corosync related).
What would be helpful:
- if corosync died, there should be
/var/lib/corosync/fdata-DATETTIME-PID of dead corosync. Can you please
xz them and store somewhere (they are quiet
Corosync died an additional 3 times during the night on storage1. I wrote a
daemon to attempt and start it as soon as it fails, so only one of those times
resulted in a STONITH of storage1.
I enabled debug in the corosync config, so I was able to capture a period when
corosync died with debug o
19 matches
Mail list logo