2012/3/6 José Alonso <j...@transtelco.net>: > Hi all, > > I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and > almost everything is working fine, I have only apache configured for > testing, when a node goes down the failover is done correctly, but there's a > problem when a node failbacks. > > For example, let's say that Node1 has the lead on apache resource, then I > reboot Node1, so Pacemaker detect it goes down, then apache is promoted to > the Node2 and it keeps there running fine, that's fine, but when Node1 > recovers and joins the cluster again, apache is restarted on Node2 again. > > Anyone knows, why resources are restarted when a node rejoins a cluster ?
I suspect we think its running in both places and you're seeing our automated recovery (stop it everywhere before choosing a new location). Logs? > > This is my pacemaker configuration: > > node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \ > attributes standby="off" > node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \ > attributes standby="off" > primitive apache2 lsb:apache2 \ > meta migration-threshold="1" failure-timeout="2" \ > op monitor interval="5s" resource-stickiness="INFINITY" > primitive ip1 ocf:heartbeat:IPaddr2 \ > params ip="192.168.1.38" nic="eth0:0" > primitive ip1arp ocf:heartbeat:SendArp \ > params ip="192.168.1.38" nic="eth0:0" > group WebServices ip1 ip1arp apache2 > location cli-prefer-WebServices WebServices \ > rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2 > colocation ip_with_arp inf: ip1 ip1arp > colocation web_with_ip inf: apache2 ip1 > order arp_after_ip inf: ip1:start ip1arp:start > order web_after_ip inf: ip1arp:start apache2:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > cluster-infrastructure="Heartbeat" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > rsc_defaults $id="rsc-options" \ > resource-stickiness="INFINITY" > > > This is what I see on crm_mon: > > 1-. Node1 and Node1 OK: > > Online: [ node1 node2 ] > > Resource Group: WebServices > ip1 (ocf::heartbeat:IPaddr2): Started node1 > ip1arp (ocf::heartbeat:SendArp): Started node1 > apache2 (lsb:apache2): Started node1 > > > 2-. I reboot Node1 so Pacemaker promotes resources to Node2: > > Online: [ node2 ] > OFFLINE: [node1] > > Resource Group: WebServices > ip1 (ocf::heartbeat:IPaddr2): Started node2 > ip1arp (ocf::heartbeat:SendArp): Started node2 > apache2 (lsb:apache2): Started node2 > > > 3-. Node1 is online again and join the cluster, resources still on Node2: > > Online: [ node1 node2 ] > > Resource Group: WebServices > ip1 (ocf::heartbeat:IPaddr2): Started node2 > ip1arp (ocf::heartbeat:SendArp): Started node2 > apache2 (lsb:apache2): Started node2 > > 4-. But after some seconds, resources are stopped on Node2 and restarted > again on the same Node2: > > Online: [ node1 node2 ] > > Resource Group: WebServices > ip1 (ocf::heartbeat:IPaddr2): Started node2 > ip1arp (ocf::heartbeat:SendArp): Stopped > apache2 (lsb:apache2): Stopped > > > 5-. Resources restarted and still on Node2 > > Online: [ node1 node2 ] > > Resource Group: WebServices > ip1 (ocf::heartbeat:IPaddr2): Started node2 > ip1arp (ocf::heartbeat:SendArp): Started node2 > apache2 (lsb:apache2): Started node2 > > > > Why resources were restarted on Node2 if they where running fine? > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org