Re: [Pacemaker] Routing-Ressources on a 2-Node-Cluster

2013-04-21 Thread Devin Reade
David Coulson wrote: > Your configuration seems to have way too many moving parts and since you are > making routing changes when the nodes become primary it is difficult to > ensure that it will actually work based upon the monitoring you are doing > when it is passive. > > Not 100% sure wha

Re: [Pacemaker] iptables cluster

2012-02-15 Thread Devin Reade
--On Monday, February 13, 2012 11:21:14 AM +0200 Karlis Kisis wrote: > In most cluster tutorials, for simplicity, iptables is turned off. > Funny thing is that iptables is what I want to configure in HA cluster > (as redundant firewalls). I debated about answering this off-list, since it might b

Re: [Pacemaker] Default gateway issues

2011-08-05 Thread Devin Reade
--On Friday, August 05, 2011 03:57:53 PM +0200 Albéric de Pertat wrote: > I have two nodes with ips 10.0.9.11/24 and 10.0.9.12/24 routed > via 10.0.9.254. I have declared an IPaddr resource for > 10.0.10.10/24. So far so good. > > primitive ip ocf:heartbeat:IPaddr \ > params ip="10.0.1

Re: [Pacemaker] Backup ring is marked faulty

2011-08-05 Thread Devin Reade
--On Thursday, August 04, 2011 03:21:58 PM +0200 Sebastian Kaps wrote: > On Thu, 4 Aug 2011 08:31:07 +0200, Tegtmeier.Martin wrote: > >> in my case it is always the slower ring that fails (the 100MB >> network). Does rrp_mode passive expect both rings to have the same >> speed? >> >> Sebastian

Re: [Pacemaker] Initial quorum

2011-07-20 Thread Devin Reade
--On Wednesday, July 20, 2011 09:19:33 AM + pskrap wrote: > I have a cluster where some of the resources cannot run on the same node. > All resources must be running to provide a functioning service. This > means that a certain amount of nodes needs to be up before it makes > sense for the

Re: [Pacemaker] KVM VM's on internal DRBD volumes

2011-06-18 Thread Devin Reade
One of my clusters (fairly old; heartbeat 3.0.0 on openais) runs VMs on Xen, with storage being directly on the drbd device. I noticed a similar problem where manual failovers were fine but cluster-initiated failovers failed. It turned out that setting a start delay on the xen RA solved the probl

Re: [Pacemaker] Mail notification for fencing action

2011-06-15 Thread Devin Reade
It may be a bit late given that you've just created your own script, but you can grab the check-cluster (and maybe check-drbd) scripts from the gno-cluster-tools package at If I have a cluster that is not otherwise monitored, I run those every three hour

Re: [Pacemaker] 3rd node just for quorum

2011-06-10 Thread Devin Reade
I have both two-node clusters that use drbd with local disk (no SAN) and a three node cluster where one node is always in standby and is just used for quorum. IMO, if you are using drbd-on-local-disk and have proper fencing support, the three node case in its current incarnation is more trouble t

Re: [Pacemaker] MySQL HA-Cluster, error on move

2011-04-11 Thread Devin Reade
I should also mention that some of the key pieces are having the pid file on local (non-drbd) disk and having your db configured to allow the RA to test the status (per the docs). Having the pid file on drbd storage causes the monitor on the backup node to fail. I don't use a different db data lo

Re: [Pacemaker] MySQL HA-Cluster, error on move

2011-04-11 Thread Devin Reade
--On Monday, April 11, 2011 02:14:44 PM +0200 Patric Falinder wrote: > The problem I have is when I need to migrate/move the resources to the > other node, or unmove it, I get this error message and mysqld won't > start/move properly: [snip] > primitive mysqld lsb:mysql > colocation mysql_on_drb

Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)

2011-04-04 Thread Devin Reade
I ran into a similar behavior with an earlier version of glusterfs on raw disk (not DRBD). In that case it was a bug in gluster that, although the nodes were supposed to be operating in a "mirror" configuration, the one remaining node would refuse to service requests after the other node was sto

Re: [Pacemaker] restart only a particular service in group if it fails

2011-03-08 Thread Devin Reade
--On Tuesday, March 08, 2011 11:24:50 AM + bikrish amatya wrote: [snip] > My question is , Is it possible to just start res2 only without stopping > other services that follows the resource(res3 , res4 , res5) which has > been stopped. A group implies ordering, so no (unless res2 is listed a

Re: [Pacemaker] Best stonith method to avoid split brain on a drbd cluster

2011-01-05 Thread Devin Reade
Johannes Freygner wrote: > *) Yes, and I found the wrong setting: Excellent. > But if I pull the power cable without a regular shutting down, > the powerless node gets status "UNCLEAN (offline)" and the > resources remains stopped. I would contend that would be correct behavior as (again assu

Re: [Pacemaker] Best stonith method to avoid split brain on a drbd cluster

2011-01-04 Thread Devin Reade
--On Monday, January 03, 2011 09:14:29 PM +0100 han...@freygner.at wrote: > As I have tested, its not a problem on the shutdown order. On a regular > shutdown everything is working fine until I pull the power cable. So just before pulling the power cable, the running node reports itself as online

Re: [Pacemaker] Best stonith method to avoid split brain on a drbd cluster

2011-01-03 Thread Devin Reade
Johannes Freygner wrote: > You mean with corosync will work fine, because I am using heartbeat instead. I suspect that it's a similar situation with heartbeat. The problem is pacemaker losing communication before the node cleanly disconnects. The behavior I saw on my own clusters is that becau

Re: [Pacemaker] Best stonith method to avoid split brain on a drbd cluster

2011-01-03 Thread Devin Reade
Johannes Freygner wrote: > could somebody give me an idea what will be the best stonith solution on a > drbd cluster to avoid split brain if the network between the nodes is lost. > > I have already tried to use stonith with ILO, but if the power cable is > removed from the node (because we ha

[Pacemaker] bacula-fd resource agent available

2010-12-29 Thread Devin Reade
I've got an RA available for bacula-fd if anyone needs it. It's GPL'd, and if anyone wants to incorporate it into the resource-agents RPM, they are welcome to do so. (The 'anything' RA isn't quite sufficient for bacula-fd.) It is available at: ftp://ftp.gno.org/pub/tools/bacula-contrib S

Re: [Pacemaker] Active / Active pacemaker configuration advice

2010-11-24 Thread Devin Reade
--On Tuesday, November 23, 2010 10:21:04 AM +0100 Marko Potocnik wrote: > I'm using ftp just for test. I want a service to run on both nodes and > only IP to move in case a service fails. > I don't want to stop / start service if node fails. You might be able to use the behavior of sshd as a hin

Re: [Pacemaker] clone resource will stop if half node in cluster offline

2010-11-24 Thread Devin Reade
--On Wednesday, November 24, 2010 06:39:19 PM +0800 jiaju liu wrote: > I have 4 nodes node1,node2,node3,node4 in cluster. and start clone > resource pingd. this ping a router. when node1 and node2 offline. The > clone resource stopped. however the resource ipmi is ok. Is the feature > of clone re

Re: [Pacemaker] AP9606 fencing device

2010-11-16 Thread Devin Reade
--On Wednesday, October 27, 2010 09:47:14 AM +0200 Pavlos Parissis wrote: > I have a APC AP9606 PDU and I am trying to find a stonith agent which > works with that PDU. I know that this is an old thread, but I'll reply anyway. I have a one cluster that uses an old APC AP9606 for which I've not

Re: [Pacemaker] Stonith Device APC AP7900

2010-11-15 Thread Devin Reade
--On Monday, November 15, 2010 08:40:45 AM -0700 Rick Cone wrote: > In production I am planning to have 2 separate AP7900 units each plugged > into 2 different APC UPS units to achieve that. I would then have the > single node name on each, for each of the 2 PS's on the individual > systems. So

Re: [Pacemaker] Stonith Device APC AP7900

2010-11-14 Thread Devin Reade
Rick Cone wrote: > Perhaps I'll just use 1 outlet with the node name, > with a power splitter to the 2 redundant power supplies to reduce the > chances of problems. IMO, if you're going to use a chassis with redundant power supplies, you're better off with a system that uses an ALOM/DRAC/iLO, or

Re: [Pacemaker] use_logd or use_mgmtd kills corosync

2010-06-09 Thread Devin Reade
Andrew Beekhof wrote: > Do any other children start up? None. > Where is the mgmtd binary installed to? /usr/lib64/heartbeat/mgmtd ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Pro

[Pacemaker] use_logd or use_mgmtd kills corosync

2010-06-08 Thread Devin Reade
I was following the instructions for a new installation of corosync and was wanting to make use of hb_gui so, following an installation via yum per the docs, built Pacemaker-Python-GUI-pacemaker-mgmt-2.0.0 from source. Starting corosync works normally without mgmtd in the picture, but as soon as *