Re: [Pacemaker] Reason for cluster resource migration

2013-02-11 Thread Vladislav Bogdanov
12.02.2013 07:11, Andrew Beekhof wrote: > On Tue, Feb 12, 2013 at 3:07 PM, Andrew Beekhof wrote: [...] >> So we'll still need the crm_report, it will have more detail on the >> "Child process pengine terminated with signal 6 (pid=19357, core=128)" >> part. > > Signal 6 is an assertion failure, bu

Re: [Pacemaker] Please add pcmk__timeout parameter to stonithd metadata

2013-02-11 Thread Andrew Beekhof
Done. https://github.com/beekhof/pacemaker/commit/8154451 Thanks for the reminder On Fri, Feb 8, 2013 at 7:05 PM, Kazunori INOUE wrote: > Hi, > > Please add the pcmk_off_timeout and pcmk_reboot_timeout parameters > to the output of 'stonithd metadata', > because ERROR message is output when I l

Re: [Pacemaker] Reason for cluster resource migration

2013-02-11 Thread Andrew Beekhof
On Tue, Feb 12, 2013 at 3:07 PM, Andrew Beekhof wrote: > On Tue, Feb 12, 2013 at 3:01 PM, Andrew Beekhof wrote: >> On Tue, Feb 12, 2013 at 1:40 PM, Andrew Martin wrote: >>> Hello, >>> >>> Unfortunately this same failure occurred again tonight, >> >> It might be the same effect, but there was no

Re: [Pacemaker] Suggestion to improve movement of booth

2013-02-11 Thread yusuke iida
Hi, Jiaju Could you please reply to this message? Regards, Yusuke. 2013/1/28 yusuke iida : > Hi, Jiaju > > 2013/1/15 Jiaju Zhang : >> On Tue, 2013-01-15 at 11:28 +0900, yusuke iida wrote: >>> Hi, Jiaju >>> >>> 2013/1/11 Jiaju Zhang : >>> > Hi Yusuke, >>> > >>> > Sorry for the late reply;) >>> >

Re: [Pacemaker] Reason for cluster resource migration

2013-02-11 Thread Andrew Beekhof
On Tue, Feb 12, 2013 at 3:01 PM, Andrew Beekhof wrote: > On Tue, Feb 12, 2013 at 1:40 PM, Andrew Martin wrote: >> Hello, >> >> Unfortunately this same failure occurred again tonight, > > It might be the same effect, but there was no indication that the PE > died last time. > >> taking down a prod

Re: [Pacemaker] Reason for cluster resource migration

2013-02-11 Thread Andrew Beekhof
On Tue, Feb 12, 2013 at 1:40 PM, Andrew Martin wrote: > Hello, > > Unfortunately this same failure occurred again tonight, It might be the same effect, but there was no indication that the PE died last time. > taking down a production cluster. Here is the part of the log where pengine > died: >

Re: [Pacemaker] Improvement for the communication failure of booth

2013-02-11 Thread yusuke iida
Hi, Jiaju 2012/12/18 Jiaju Zhang : > Good suggestion! I think it may need to introduce a notifier callback so > that the failure of communicating with the problematic node can be > notified to the "active" node. This makes sense for the active node, > because it will make the admin know how many h

Re: [Pacemaker] Reason for cluster resource migration

2013-02-11 Thread Andrew Martin
Hello, Unfortunately this same failure occurred again tonight, taking down a production cluster. Here is the part of the log where pengine died: Feb 11 17:05:15 storage0 pacemakerd[1572]: notice: pcmk_child_exit: Child process pengine terminated with signal 6 (pid=19357, core=128) Feb 11 17:05

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Andrew Beekhof
On Tue, Feb 12, 2013 at 1:01 PM, Dennis Jacobfeuerborn wrote: > On 02/12/2013 02:38 AM, Andrew Beekhof wrote: >> On Tue, Feb 12, 2013 at 3:09 AM, Dennis Jacobfeuerborn >> wrote: >>> On 02/11/2013 11:30 AM, Andrew Beekhof wrote: On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov wrote

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Dennis Jacobfeuerborn
On 02/12/2013 02:38 AM, Andrew Beekhof wrote: > On Tue, Feb 12, 2013 at 3:09 AM, Dennis Jacobfeuerborn > wrote: >> On 02/11/2013 11:30 AM, Andrew Beekhof wrote: >>> On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov >>> wrote: It is VM in the OpenStack. So we can't use static IP. Righ

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Andrew Beekhof
On Tue, Feb 12, 2013 at 3:09 AM, Dennis Jacobfeuerborn wrote: > On 02/11/2013 11:30 AM, Andrew Beekhof wrote: >> On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov >> wrote: >>> It is VM in the OpenStack. So we can't use static IP. >>> Right now investigating why interface become down. >> >> Ev

Re: [Pacemaker] PostgreSQL replicarion RA: PGSQL.lock

2013-02-11 Thread Takatoshi MATSUO
Hi 2013/2/9 Andrew : > Hi all. > For what reason is implemented PGSQL.lock in RA, and what pbs may happen if > it'll be removed from RA code? It may cause data inconsistency. If the file exists in a node, you need to copy data from new master. > Also, 2nd question: how I can prevent pgsql RA fro

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Viacheslav Biriukov
Yes, you are right. In our case we got this bug with dnsmasq ( http://markmail.org/message/7kjf4hljszpydsrx#query:+page:1+mid:7kjf4hljszpydsrx+state:results ). Still investigating. 2013/2/11 Dennis Jacobfeuerborn > On 02/11/2013 11:30 AM, Andrew Beekhof wrote: > > On Mon, Feb 11, 2013 at 9:24

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Dennis Jacobfeuerborn
On 02/11/2013 11:30 AM, Andrew Beekhof wrote: > On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov > wrote: >> It is VM in the OpenStack. So we can't use static IP. >> Right now investigating why interface become down. > > Even if you solve that, dynamic IP addresses are fundamentally > incompa

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Viacheslav Biriukov
We need solution for something like VIP for our MySQL servers (for example) with auto migration when something go wrong. If you have a better solution – please suggest. Talking about dynamic IP addresses: it is not important for us. After boot (not every day) we reconfigure cluster using maintenanc

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Andrew Beekhof
On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov wrote: > It is VM in the OpenStack. So we can't use static IP. > Right now investigating why interface become down. Even if you solve that, dynamic IP addresses are fundamentally incompatible with cluster software. You're effectively trying to

Re: [Pacemaker] RES: Reboot of cluster members with heavy load on filesystem.

2013-02-11 Thread Dan Frincu
Hi, On Mon, Feb 11, 2013 at 12:21 PM, Andrew Beekhof wrote: > On Mon, Feb 11, 2013 at 12:41 PM, Carlos Xavier > wrote: >> Hi Andrew, >> >> tank you very much for your hints. >> >>> > Hi. >>> > >>> > We are running two clusters compounded of two machines. We are using DRBD >>> > + OCFS2 to make

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Viacheslav Biriukov
It is VM in the OpenStack. So we can't use static IP. Right now investigating why interface become down. Thank you! 2013/2/11 Viacheslav Biriukov > > > > 2013/2/11 Dan Frincu > >> Hi, >> >> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov >> wrote: >> > Hi guys, >> > >> > Got a tricky iss

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Viacheslav Biriukov
2013/2/11 Dan Frincu > Hi, > > On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov > wrote: > > Hi guys, > > > > Got a tricky issue with Corosync and Pacemaker over DHCP IP address using > > unicast. Corosync craches periodically. > > > > Packages are from centos 6 repos: > > corosync-1.4.1-7.e

Re: [Pacemaker] RES: Reboot of cluster members with heavy load on filesystem.

2013-02-11 Thread Andrew Beekhof
On Mon, Feb 11, 2013 at 12:41 PM, Carlos Xavier wrote: > Hi Andrew, > > tank you very much for your hints. > >> > Hi. >> > >> > We are running two clusters compounded of two machines. We are using DRBD >> > + OCFS2 to make the common >> filesystem. > > [snip] > >> > >> > The clusters run nice wit

Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Dan Frincu
Hi, On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov wrote: > Hi guys, > > Got a tricky issue with Corosync and Pacemaker over DHCP IP address using > unicast. Corosync craches periodically. > > Packages are from centos 6 repos: > corosync-1.4.1-7.el6_3.1.x86_64 > corosynclib-1.4.1-7.el6_3.1.