[Pacemaker] heartbeat:anything resource not stop/monitoring after reboot

2013-09-05 Thread David Coulson
We patched and rebooted one of our clusters this morning - I verified that pacemaker is the same as previous, plus it matches another similar cluster. There is a resource in the cluster defined as: primitive re-named-reload ocf:heartbeat:anything \ params binfile="/usr/sbin/rndc" cmdli

Re: [Pacemaker] named

2013-06-05 Thread David Coulson
On 6/5/13 2:30 PM, paul wrote: Hi. I have followed the Clusters from scratch PDF and have a working two node active passive cluster with ClusterIP, WebDataClone,WebFS and WebSite working. I am using BIND DNS to direct my websites to the cluster address. When I perform a failover which works ok I

[Pacemaker] Group attributes ordered/collocated no longer valid?

2013-05-31 Thread David Coulson
Trying to commit a change to a group that looks like this: group gr-ns-auth-ip re-auth6-ns1-ip re-auth6-ns2-ip re-auth-ns1-ip re-auth-ns2-ip re-ns1auth-ip re-ns2auth-ip re-ns3auth-ip \ meta ordered="false" collocated="false" "/tmp/tmpBQyloG.pcmk" 118L, 4752C written

Re: [Pacemaker] why so long to stonith?

2013-04-25 Thread David Coulson
On 4/25/13 7:43 PM, Andrew Beekhof wrote: I certainly hope so :) So I should complain to our sales people about this BZ before we upgrade our clusters to 6.4? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailm

Re: [Pacemaker] Routing-Ressources on a 2-Node-Cluster

2013-04-19 Thread David Coulson
On 4/19/13 5:48 AM, T. wrote: When a server gets active, it will get the cluster-ip "10.20.10.70" and the default route to "10.20.10.1". Why can't both your cluster nodes have 10.20.10.1 as their default route all the time? Your configuration seems to have way too many moving parts and since

Re: [Pacemaker] Routing-Ressources on a 2-Node-Cluster

2013-04-15 Thread David Coulson
On Apr 15, 2013, at 1:59 PM, "T." wrote: > > > For the access-network I use a different NIC, the nodes are in different > networks, NodeA has 10.20.11.70, NodeB has 10.20.12.70 and I have > configured a cluster-ip, the active node gets, (10.20.10.70). Are they really on different networks? Wh

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-04-09 Thread David Coulson
On 4/9/13 7:18 PM, Andrew Beekhof wrote: Pacemaker is not supported in 6.3 and all I am allowed to say at this point[1] is that your configuration isn't supportable for 6.4 Not because you've configured anything wrong/badly, but because a specific application is not present. However, if I can

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-04-09 Thread David Coulson
On 4/7/13 10:29 PM, Andrew Beekhof wrote: Really really weird, I've got nothing :( We've added SPANs on the switches for the two boxes in the cluster, so we can hopefully at least identify that the ARP frame didn't come from them. Of course, we've not had an occurrence of it in almost a month

Re: [Pacemaker] rhel6/cman+pacemaker - how to use clvm?

2013-04-08 Thread David Coulson
On 4/8/13 7:37 AM, Vadym Chepkov wrote: What if a clustered volume group appears only when pacemaker establishes iSCSI connection? just make sure you activate the VG before trying to mount anything. ___ Pacemaker mailing list: Pacemaker@oss.cluste

Re: [Pacemaker] rhel6/cman+pacemaker - how to use clvm?

2013-04-08 Thread David Coulson
On 4/8/13 6:42 AM, Yuriy Demchenko wrote: The purpose of my cluster is to provide HA VM and routing/gateway (thus RHCS isn't an option for me - no IPaddr2 and Route resources). But I cannot find any documentation how to use cLVM in cman+pacemaker cluster, everything I found requires use of "o

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-03-18 Thread David Coulson
On 3/18/13 5:24 PM, Andrew Beekhof wrote: So: 1. the IP moved from 01 to 02 2. 01 was then rebooted 3. a long time passes 4. 01 starts arping for the IP Is that what you're saying? Is the problem transient or does it persist? Went like this - IP movements are all by Pacemaker/IPaddr resource

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-03-17 Thread David Coulson
On Mon, Mar 18, 2013 at 3:17 AM, David Coulson wrote: >> First off, I'm going to preface this with the realization that what I am >> explaining makes no sense, doesn't follow normal logic and I'm not a >> complete idiot. I've beaten my head against a wall with

Re: [Pacemaker] RHEL/CentOS 6.4: corosync -> CMAN migration

2013-03-17 Thread David Coulson
On Mar 11, 2013, at 7:32 PM, Andrew Beekhof wrote: > > > In fact prior to 6.4, Pacemaker only had Tech Preview status - using > the CMAN plugin instead of our home grown one was key to that > changing. Is Pacemaker not tech preview in 6.4 anymore? What is the support status of Pacemaker on 6.4?

Re: [Pacemaker] apache problems when moving an Ipaddr2 resource

2013-03-17 Thread David Coulson
What is the specific error you get from Apache? Does it not start, or does it just not work properly? How are you ensuring your two nodes have the same apache configuration? David On Mar 17, 2013, at 8:13 PM, Luis Daniel Lucio Quiroz wrote: > strange, > > i have 2 hosts in a cluster > > clus

[Pacemaker] Wrong system send arp reply when using IPaddr

2013-03-17 Thread David Coulson
First off, I'm going to preface this with the realization that what I am explaining makes no sense, doesn't follow normal logic and I'm not a complete idiot. I've beaten my head against a wall with this issue for two days, and have made no progress, yet we've had a couple of production system o

Re: [Pacemaker] Correct order of meta/params in crm

2013-03-03 Thread David Coulson
On 3/3/13 1:00 PM, Lars Marowsky-Bree wrote: My memory may be very faulty, but I thought this didn't lead to the failure actually be cleaned up automatically, but "merely" ignored post-timeout. Perhaps 'clean up' is the wrong phrase. But I've absolutely seen it remove the failure out of 'crm_mo

Re: [Pacemaker] Correct order of meta/params in crm

2013-03-03 Thread David Coulson
On 3/2/13 8:22 AM, Lars Marowsky-Bree wrote: Unless it annoys you, this is actually harmless. Otherwise, params first is what I tend to use. Regards, Lars We've seen instances where failure-timeout is set, but Pacemaker never seems to clean up the failure. First thought was it didn't a

[Pacemaker] Correct order of meta/params in crm

2013-03-02 Thread David Coulson
Running Pacemaker 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 I noticed we have inconsistent ordering of meta/params in our configuration - For some resources, meta comes before params, in some cases after. In the case below, both. I am assuming meta before params is the correct way t

Re: [Pacemaker] time synchronisation

2012-12-19 Thread David Coulson
On 12/19/12 5:06 AM, James Harper wrote: What is the best way on bootup in the above situation to ensure time synchronisation? Is it as simple as having a cron job to reset the hardware clock every so often so that on reboot things are reasonable? At least RHEL and SuSE can do an explicit ntp

Re: [Pacemaker] Configure simple Network Load-balancing

2012-11-30 Thread David Coulson
to other server etc... Is it something wrong with my configuration or that's > the way it's working? > > Thanks and regards > > > On Fri, Nov 30, 2012 at 1:36 PM, David Coulson wrote: > I would add HA to your existing HA config - The primary issue you have right

Re: [Pacemaker] Configure simple Network Load-balancing

2012-11-30 Thread David Coulson
grow up till 3000 in the next two months. Servers have Tomcat installed on them, so basically I need to load balance connections from outside to the Tomcat. Regards On Fri, Nov 30, 2012 at 1:19 PM, David Coulson <mailto:da...@davidcoulson.net>> wrote: All the connections, from ho

Re: [Pacemaker] Configure simple Network Load-balancing

2012-11-30 Thread David Coulson
All the connections, from how many clients? You might be better off using LVS for this. David On 11/30/12 7:15 AM, Ratko Dodevski wrote: Hi guys, I need some help on configuring NLB for my application servers. I've installed tomcat on 4 servers and I've decided to use Linux HA for Network Loa

Re: [Pacemaker] Error while mount gfs2 filesystem in active/active clustering

2012-09-13 Thread David Coulson
On 9/13/12 7:33 AM, ecfgijn wrote: Hi All , I have configure active/active clustering in centos-6.2. But when i try to mount gfs2 file system i am getting an error , which is mentioned below [root@node1 ~]# mount /dev/sdb1 /mnt/ gfs_controld join connect error: Connection refused error moun

Re: [Pacemaker] Expired fail-count doesn't get cleaned up.

2012-08-14 Thread David Coulson
On 8/13/12 8:01 PM, Andrew Beekhof wrote: You might be experiencing: + David Vossel (5 months ago) 9263480: Low: pengine: cl#5025 - Automatically clear failures when resource configuration changes. But if you send us a crm_report tarball coving the period during which you had problems, we can

[Pacemaker] Expired fail-count doesn't get cleaned up.

2012-07-31 Thread David Coulson
I'm running RHEL6 with the tech preview of pacemaker it ships with. I've a number of resources which have a failure-timeout="60", which most of the time does what it is supposed to. Last night a resource failed, which was part of a clone - While the resource recovered, the fail-count log never

Re: [Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6

2012-07-23 Thread David Coulson
We run many RHEL6 clusters using cman/corosync/pacemaker with SELinux enabled. Doubt that is the problem. The original poster wasn't using cman, but I'm not sure that makes a substantial difference. On 7/23/12 7:15 AM, Vladislav Bogdanov wrote: 23.07.2012 08:06, David Barchas wrote: Hello.

Re: [Pacemaker] Pull the plug on one node, the resource doesn't promote the other node to master

2012-07-20 Thread David Coulson
Use ping to set an attribute, then add a location. primitive re-ping-core ocf:pacemaker:ping \ meta failure-timeout="60" \ params name="at-ping-core" host_list="10.250.52.1" multiplier="100" attempts="5" \ op monitor interval="20" timeout="15" \ op start interval

[Pacemaker] Restart clone service on only one node

2012-06-25 Thread David Coulson
I've a couple of cloned resources which need to be restarted one at a time as part of a batch process. If I do a 'crm -w resource restart cl-whatever', it restarts the whole lot at once. I can do a 'service appname stop' on each box, wait for pacemaker to notice it is down, then let it restart

Re: [Pacemaker] group resource - altering default order

2012-06-14 Thread David Coulson
On 6/14/12 8:28 PM, Andrew Beekhof wrote: Just use a colocation set instead. Is there a better option than a non-ordered,non-collocated group when you need a order dependency? We have a couple of clone resources, which are dependent on a non-collocated group (the resources in the group are di

Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread David Coulson
Pacemaker needs to be able to monitor on all nodes. Maybe if you install drbd on the third node but don't configure anything monitor will correctly report it is not running over there, and your location rules will stop it from even trying. Or just change the RA for DRBD to report not running i

Re: [Pacemaker] 2-node clusters, who's the master now.

2012-06-07 Thread David Coulson
If you are running two nodes, you need to tell pacemaker you don't care if it can't get quorum, by only having 1 of 2 nodes available. Neither node which take over in this event know if there is split brain or not, so you will need to make sure you have sufficient infrastructure between the two

Re: [Pacemaker] KVM DRBD and Pacemaker

2012-06-04 Thread David Coulson
Can you post your pacemaker config on pastebin? David On Jun 4, 2012, at 3:51 PM, Cliff Massey wrote: > > I am trying to setup a cluster consisting of KVM DRBD and pacemaker. Without > pacemaker DRBD and KVM are working. I can even stop everything on one node, > promote the other to drbd pri

Re: [Pacemaker] VIP on Active/Active cluster

2012-05-14 Thread David Coulson
Cloning IPAddr2 resources utilizes the iptables CLUSTERIP rule. Probably a good idea to start looking at it w/ tcpdump and seeing if either box gets the icmp echo-request packet (from a ping) and determining if it just doesn't respond properly, doesn't get it at all, or something else. I'd say

Re: [Pacemaker] lvm resource doesn't start

2012-05-12 Thread David Coulson
If clvmd hangs, you probably don't have fencing configured properly - It will block IO until a node is fenced correctly. On May 12, 2012, at 12:16 PM, Frank Van Damme wrote: > Hi list, > > I'm assembling a cluster for A/P nfs on Debian Squeeze (6.0). For > flexibility I want with LVM. So the

Re: [Pacemaker] VIP on Active/Active cluster

2012-05-09 Thread David Coulson
What application is running on the nodes? Sent from my iPad On May 9, 2012, at 3:10 PM, Paul Damken wrote: > Hello, > > I wonder if someone can light me on how to handle the following cluster scene: > > 2 Nodes Cluster (Active/Active) > 1 Cluster managed VIP - RoundRobin ? > SAN Shared Stor

Re: [Pacemaker] Failed Actions

2012-05-04 Thread David Coulson
Why not run two separate clusters - One for VMs, one for DRBD. You can create a group containing the resources and have the location constraint reference the group - You probably want to set the group to 'ordered=false' and 'collocated=false'. That said, if you split your environment into two c

Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover

2012-03-24 Thread David Coulson
Shutdown pacemaker and fix your drbd disk first. Get them both uptodate/uptodate and make sure you can manually switch them to primary on each node. Node2 can't become primary when it's not connected to something with an uptodate disk. On 3/24/12 3:15 PM, Andrew Martin wrote: Hi Andreas, M

Re: [Pacemaker] Dnsmasq

2012-03-23 Thread David Coulson
Did dnsmasq log that it is listening on the cluster address? You could try adding an iptables nat rule to the box and see if that works. Nat the cluster address for port 53 to the local server ip. Sent from my iPad On Mar 23, 2012, at 9:35 PM, Gregg Stock wrote: > I'm have some "interesting"

Re: [Pacemaker] Deleting the resource while it's running

2012-03-22 Thread David Coulson
On 3/22/12 5:09 AM, Ante Karamatic wrote: Hi I've came across an odd behavior, which might be considered as inconsistent. As we know, pacemaker doesn't allow deleting a resource that's running, but this doesn't produce same behavior every time. Let's take a VM with a default stop timeout (90

Re: [Pacemaker] pacemaker + rhel6 and clvmd

2012-03-14 Thread David Coulson
Are you running 'real' RHEL6? If so, cman + clvmd + gfs2 is the way to go. You can run pacemaker on top of all of that (without openais) to manage your resources if you don't want to use rgmanager. I've never tried to run clvmd out of pacemaker, but there is a init.d script for it in RHEL6,

Re: [Pacemaker] ldirectord on RHEL6.2?

2012-02-27 Thread David Coulson
On 2/27/12 5:44 PM, Andrew Beekhof wrote: On Tue, Feb 28, 2012 at 7:45 AM, David Coulson wrote: Yep. Is that not from the EPEL repos though? I didn't think we shipped it (since I've had people complain to me about that) Oops. You are right. It was added to our RHN Satellite s

Re: [Pacemaker] ldirectord on RHEL6.2?

2012-02-27 Thread David Coulson
Yep. # rpm -qi ldirectord Name: ldirectord Relocations: (not relocatable) Version : 3.0.7 Vendor: Red Hat, Inc. Release : 5.el6 Build Date: Thu 25 Feb 2010 04:19:10 AM EST Install Date: Sat 22 Oct 2011 10:17:39

Re: [Pacemaker] Resource inter-dependency without being a 'group'

2012-02-18 Thread David Coulson
On 2/18/12 4:33 PM, Florian Haas wrote: Is setting "meta collocated=false" not working for your group? Along similar lines, if I have default-resource-stickiness="200" set, what is the best way to 'rebalance' resources following a node failure? In general, if I lose a node I don't want resource

Re: [Pacemaker] Resource inter-dependency without being a 'group'

2012-02-18 Thread David Coulson
On 2/18/12 4:33 PM, Florian Haas wrote: Is setting "meta collocated=false" not working for your group? Yep, I found that option shortly after posting my email question. Need to try it in production tomorrow morning, but it worked in my dev environment with dummy resources. Thanks- David

[Pacemaker] Resource inter-dependency without being a 'group'

2012-02-18 Thread David Coulson
I have an active/active LVS cluster, which uses pacemaker for managing IP resources. Currently I have one environment running on it which utilizes ~30 IP addresses, so a group was created so all resources could be stopped/started together. Downside of that is that all resources have to run on t

Re: [Pacemaker] Pacemaker in RHEL6.

2011-08-10 Thread David Coulson
On 8/10/11 11:43 AM, Marco van Putten wrote: Thanks Andreas. But our managers persist on using Redhat. I think the idea would be to take the HA packages distributed with Scientific Linux 6.x and run them on RHEL. Note that even when you do subscribe to the HA add-on in RHEL6, pacemaker is

Re: [Pacemaker] Cluster Volume Group is stuck

2011-05-11 Thread David Coulson
On 5/11/11 8:07 AM, Karl Rößmann wrote: we have a three node cluster with a Cluster Volume Group vgsmet. After powering off one Node, the Volume Group is stuck. One of the ERROR messages is: May 11 10:50:32 multix244 crmd: [8086]: ERROR: process_lrm_event: LRM operation vgsmet:0_monitor_600

[Pacemaker] Resource monitor attempting to run on 'other' node.

2011-03-29 Thread David Coulson
Pretty simple configuration - Two nodes running cman backed pacemaker. I have three resources which are group together to support an application. Filesystem, IP, and the app itself. My app is currently misconfiguration, so I expect it to blow up when it tries to start. In crm_mon, I have a con