Hi all, I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and almost everything is working fine, I have only apache configured for testing, when a node goes down the failover is done correctly, but there's a problem when a node failbacks.
For example, let's say that Node1 has the lead on apache resource, then I reboot Node1, so Pacemaker detect it goes down, then apache is promoted to the Node2 and it keeps there running fine, that's fine, but when Node1 recovers and joins the cluster again, apache is restarted on Node2 again. Anyone knows, why resources are restarted when a node rejoins a cluster ? This is my pacemaker configuration: node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \ attributes standby="off" node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \ attributes standby="off" primitive apache2 lsb:apache2 \ meta migration-threshold="1" failure-timeout="2" \ op monitor interval="5s" resource-stickiness="INFINITY" primitive ip1 ocf:heartbeat:IPaddr2 \ params ip="192.168.1.38" nic="eth0:0" primitive ip1arp ocf:heartbeat:SendArp \ params ip="192.168.1.38" nic="eth0:0" group WebServices ip1 ip1arp apache2 location cli-prefer-WebServices WebServices \ rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2 colocation ip_with_arp inf: ip1 ip1arp colocation web_with_ip inf: apache2 ip1 order arp_after_ip inf: ip1:start ip1arp:start order web_after_ip inf: ip1arp:start apache2:start property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="Heartbeat" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="INFINITY" This is what I see on crm_mon: 1-. Node1 and Node1 OK: Online: [ node1 node2 ] Resource Group: WebServices ip1 (ocf::heartbeat:IPaddr2): Started node1 ip1arp (ocf::heartbeat:SendArp): Started node1 apache2 (lsb:apache2): Started node1 2-. I reboot Node1 so Pacemaker promotes resources to Node2: Online: [ node2 ] OFFLINE: [node1] Resource Group: WebServices ip1 (ocf::heartbeat:IPaddr2): Started node2 ip1arp (ocf::heartbeat:SendArp): Started node2 apache2 (lsb:apache2): Started node2 3-. Node1 is online again and join the cluster, resources still on Node2: Online: [ node1 node2 ] Resource Group: WebServices ip1 (ocf::heartbeat:IPaddr2): Started node2 ip1arp (ocf::heartbeat:SendArp): Started node2 apache2 (lsb:apache2): Started node2 4-. But after some seconds, resources are stopped on Node2 and restarted again on the same Node2: Online: [ node1 node2 ] Resource Group: WebServices ip1 (ocf::heartbeat:IPaddr2): Started node2 ip1arp (ocf::heartbeat:SendArp): Stopped apache2 (lsb:apache2): Stopped 5-. Resources restarted and still on Node2 Online: [ node1 node2 ] Resource Group: WebServices ip1 (ocf::heartbeat:IPaddr2): Started node2 ip1arp (ocf::heartbeat:SendArp): Started node2 apache2 (lsb:apache2): Started node2 Why resources were restarted on Node2 if they where running fine?
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org