[Pacemaker] Speeding up failover

2013-07-24 Thread Devdas Bhagat
We have a master-slave setup for Redis, running 6 instances of Redis on each physical host, and one floating IP between them. Each redis instance is part of a single group. When we fail over the IP in production, I'm observing this sequence of events: Pacemaker brings down the floating IP Pacemak

Re: [Pacemaker] heartbeat 3.0 start error

2013-07-24 Thread Andrew Beekhof
On 25/07/2013, at 2:36 PM, claire huang wrote: > Andrew, > Hi!there is one question to ask for your help. > 1、My os is- > [root@2U_222 cluster]# lsb_release -a > LSB Version: > :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch

[Pacemaker] reorg of network:ha-clustering repo on build.opensuse.org

2013-07-24 Thread Tim Serong
Hi All, This is just a quick heads-up. We're in the process of reorganising the network:ha-clustering repository on build.opensuse.org. If you don't use any of the software from this repo feel free to stop reading now :) Currently we have: - network:ha-clustering (stable builds for various dis

Re: [Pacemaker] Floating IP Address

2013-07-24 Thread Gopalakrishnan N
yes I feel the config has to be bit improved with Master / Salve concept, or else what happens is when one server goes down the pointing one, the other one is not propagated. For this am trying to install DRBD and GFS, but the packages are missing, am trying to get the rpm to install the same. An

[Pacemaker] Attention - Re: Release candidate: 1.1.10-rc7

2013-07-24 Thread Andrew Beekhof
Unless anyone can show me a reason otherwise, 1.1.10-final is getting tagged tomorrow. -- Andrew On 22/07/2013, at 10:53 AM, Andrew Beekhof wrote: > Announcing the seventh release candidate for Pacemaker 1.1.10 > >https://github.com/ClusterLabs/pacemaker/releases/Pacemaker-1.1.10-rc7 > >

Re: [Pacemaker] ifdown ethX + corosync + DRBD = split-brain?

2013-07-24 Thread Andrew Beekhof
On 19/07/2013, at 9:38 PM, "Howley, Tom" wrote: > Hi, > > I have been doing some testing of a fairly standard pacemaker/corosync setup > with DRBD (with resource-level fencing) and have noticed the following in > relation to testing network failures: > > - Handling of all ports being blocked

Re: [Pacemaker] version compatility

2013-07-24 Thread Andrew Beekhof
On 17/07/2013, at 6:47 PM, K Mehta wrote: > Andrew, > I am using centos 6.2 and RHEL 6.2. Currently i have setup with whatever > comes with distribution. > > 1. Are the components (pacemaker, cman, corosync) forward and backward > compatible with each other..just in cases where i want to c

Re: [Pacemaker] CRMd exits because of internal error

2013-07-24 Thread Andrew Beekhof
On 19/07/2013, at 12:20 AM, K Mehta wrote: > Hi, > > I have a two node cluster. I have few resources configured on it. On vqa12, > CRMd dies due to some internal error. It is not clear why CRMd decides to die > on May5 at 22:14:50 on system vqa12 Its because of: May 05 22:14:50 [3518] vqa1

Re: [Pacemaker] Floating IP Address

2013-07-24 Thread Andrew Beekhof
On 23/07/2013, at 2:39 AM, Gopalakrishnan N wrote: > Now I got one more issue, when I stop the complete pacemaker application, the > other one node automatically projects me the http service. > > But when I stop the http service alone in one node which is pointing to > ClusterIP, the page is

Re: [Pacemaker] How to perform a clean shutdown of Pacemaker in the event of network connection loss

2013-07-24 Thread Andrew Beekhof
On 24/07/2013, at 6:40 PM, Tan Tai hock wrote: > > I did not enable fencing. I observe the process running and see that when the > node is up, I will see the following processes: > > Corosync > -- > /usr/sbin/corosync > > Pacemaker > - > /usr/libexec/pacemaker/lrm

Re: [Pacemaker] Question about the behavior when a pacemaker's process crashed

2013-07-24 Thread Andrew Beekhof
On 24/07/2013, at 7:40 PM, Kazunori INOUE wrote: > (13.07.18 19:23), Andrew Beekhof wrote: >> >> On 17/07/2013, at 6:53 PM, Kazunori INOUE >> wrote: >> >>> (13.07.16 21:18), Andrew Beekhof wrote: On 16/07/2013, at 7:04 PM, Kazunori INOUE wrote: > (13.07.15 11:00),

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Andrew Beekhof
On 24/07/2013, at 10:09 PM, Lars Marowsky-Bree wrote: > On 2013-07-24T21:40:40, Andrew Beekhof wrote: > >>> Statically assigned nodeids? >> Wouldn't hurt, but you still need to bring down the still-active node to get >> it to talk to the new node. >> Which sucks > > Hm. But ... corosync/pac

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Lars Marowsky-Bree
On 2013-07-24T21:40:40, Andrew Beekhof wrote: > > Statically assigned nodeids? > Wouldn't hurt, but you still need to bring down the still-active node to get > it to talk to the new node. > Which sucks Hm. But ... corosync/pacemaker ought to identify the node via the nodeid. If it comes back w

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Andrew Beekhof
On 24/07/2013, at 7:26 PM, Lars Marowsky-Bree wrote: > On 2013-07-24T09:00:23, Andrew Beekhof wrote: > >>> 4. Node A is back with a different internal ip address. >> >> This is your basic problem. >> >> I don't believe there is any cluster software that is designed to support >> this sort o

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Jacobo García
Thanks for your answers. Is it possible to configure corosync/pacemaker in this scenario so if a node goes down instead of bringing the node back I can build a new one and add it to the cluster? Building new nodes is almost "free" for me. Also, what's the difference between bringing a new node fro

Re: [Pacemaker] Question about the behavior when a pacemaker's process crashed

2013-07-24 Thread Kazunori INOUE
(13.07.18 19:23), Andrew Beekhof wrote: On 17/07/2013, at 6:53 PM, Kazunori INOUE wrote: (13.07.16 21:18), Andrew Beekhof wrote: On 16/07/2013, at 7:04 PM, Kazunori INOUE wrote: (13.07.15 11:00), Andrew Beekhof wrote: On 12/07/2013, at 6:28 PM, Kazunori INOUE wrote: Hi, I'm using p

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Lars Marowsky-Bree
On 2013-07-24T09:00:23, Andrew Beekhof wrote: > > 4. Node A is back with a different internal ip address. > > This is your basic problem. > > I don't believe there is any cluster software that is designed to support > this sort of scenario. > Even at the corosync level, it has no knowledge tha

Re: [Pacemaker] How to perform a clean shutdown of Pacemaker in the event of network connection loss

2013-07-24 Thread Tan Tai hock
I did not enable fencing. I observe the process running and see that when the node is up, I will see the following processes: Corosync -- /usr/sbin/corosync Pacemaker - /usr/libexec/pacemaker/lrmd /usr/libexec/pacemaker/pengine pacemakerd /usr/libexec/pacemaker/stonith