On 10/16/2012 04:45 AM, Andrew Beekhof wrote:
On Tue, Oct 16, 2012 at 3:04 PM, David Parker<dpar...@utica.edu>  wrote:
----- Original Message -----
From: David Parker<dpar...@utica.edu>
Date: Friday, October 12, 2012 4:57 pm
Subject: [Pacemaker] Stuck in a STONITH cycle
To: pacemaker@oss.clusterlabs.org

I have two nodes set up in a cluster to provide a MySQL server
(mysqld)
in HA on a virtual IP address.  This was working fine until
I had to
reboot the servers.  All I did was change the interface
each node uses
for its primary IP address (changed from eth1 to eth0 on each
node).
Now I'm stuck in a cycle.  Let's say node 1 has the virtual
IP and is
running mysqld, and node 2 is down.  When node 2 boots up,
it will
STONITH node 1 for no apparent reason and take over the
resources, which
shouldn't happen.  When node 1 boots up again, it will
STONITH node 2
and take over the resources, which again shouldn't happen.
...
Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha:
Requesting the list of configured nodes
Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not
calculate
UUID for ha2
Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha:
Node ha2:
no uuid found
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All
1
cluster nodes are eligible to run resources.

The exact opposite shows up on the node "ha2" (it says ha1 has
no
uuid).  Did the UUID of each node change because the
physical interface
changed?  Any other ideas?  Thanks in advance.

Just wanted to follow up in case anyone else encounters this problem.  I was
able to solve the problem by moving the primary IP address of each node back
to its original interface (eth1), so it seems the UUID of each is node in
the cluster depends on the interface.
No. The on disk uuid isn't dynamic.
In fact once set, it never changes.

I'm not sure what you managed to do, but I'm glad you have it working again.

Thanks, Andrew. I checked out the link you provided in your other response[1], and the STONITH death match exactly describes the behavior I was seeing. Strangely, though, none of the three conditions listed in that article were present in my configuration. Network communication was not broken, neither node was physically failing, and there were no HA resources acting wonky.

The weirdest part is that the nodes could ping each other, but their ability to see each other via the crm was broken. The error in each node's log was that it couldn't calculate the UUID for the other node. For some reason, changing the interface back on each node solved the problem, but I guess we'll never know why it happened in the first place.

[1] http://oss.clusterlabs.org/pipermail/pacemaker/2012-October/015674.html

  With each node's IP address back on
eth1, the cluster works fine and there's no STONITH cycle.

Now another question...  Is there a way to update the UUID of each node if
you do something crazy and move IP addresses to new interfaces, like I did?


     Thanks,
     Dave

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

--

Dave Parker
Systems Administrator
Utica College
Integrated Information Technology Services
(315) 792-3229
Registered Linux User #408177


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to