On Wed, Oct 17, 2012 at 3:37 AM, David Parker <dpar...@utica.edu> wrote: > On 10/16/2012 04:45 AM, Andrew Beekhof wrote: >> >> On Tue, Oct 16, 2012 at 3:04 PM, David Parker<dpar...@utica.edu> wrote: >>> >>> ----- Original Message ----- >>> From: David Parker<dpar...@utica.edu> >>> Date: Friday, October 12, 2012 4:57 pm >>> Subject: [Pacemaker] Stuck in a STONITH cycle >>> To: pacemaker@oss.clusterlabs.org >>> >>>> I have two nodes set up in a cluster to provide a MySQL server >>>> (mysqld) >>>> in HA on a virtual IP address. This was working fine until >>>> I had to >>>> reboot the servers. All I did was change the interface >>>> each node uses >>>> for its primary IP address (changed from eth1 to eth0 on each >>>> node). >>>> Now I'm stuck in a cycle. Let's say node 1 has the virtual >>>> IP and is >>>> running mysqld, and node 2 is down. When node 2 boots up, >>>> it will >>>> STONITH node 1 for no apparent reason and take over the >>>> resources, which >>>> shouldn't happen. When node 1 boots up again, it will >>>> STONITH node 2 >>>> and take over the resources, which again shouldn't happen. >>> >>> ... >>>> >>>> Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha: >>>> Requesting the list of configured nodes >>>> Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not >>>> calculate >>>> UUID for ha2 >>>> Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: >>>> Node ha2: >>>> no uuid found >>>> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All >>>> 1 >>>> cluster nodes are eligible to run resources. >>>> >>>> The exact opposite shows up on the node "ha2" (it says ha1 has >>>> no >>>> uuid). Did the UUID of each node change because the >>>> physical interface >>>> changed? Any other ideas? Thanks in advance. >>>> >>> Just wanted to follow up in case anyone else encounters this problem. I >>> was >>> able to solve the problem by moving the primary IP address of each node >>> back >>> to its original interface (eth1), so it seems the UUID of each is node in >>> the cluster depends on the interface. >> >> No. The on disk uuid isn't dynamic. >> In fact once set, it never changes. >> >> I'm not sure what you managed to do, but I'm glad you have it working >> again. > > > Thanks, Andrew. I checked out the link you provided in your other > response[1], and the STONITH death match exactly describes the behavior I > was seeing. Strangely, though, none of the three conditions listed in that > article were present in my configuration. Network communication was not > broken, neither node was physically failing, and there were no HA resources > acting wonky. > > The weirdest part is that the nodes could ping each other, but their ability > to see each other via the crm was broken. The error in each node's log was > that it couldn't calculate the UUID for the other node.
For Heartbeat based clusters there is actually an on-disk table of known nodes and their UUIDs. So the error message is a bit misleading, there's no calculation, just a lookup. Very very strange. > For some reason, > changing the interface back on each node solved the problem, but I guess > we'll never know why it happened in the first place. > > [1] http://oss.clusterlabs.org/pipermail/pacemaker/2012-October/015674.html > > >>> With each node's IP address back on >>> eth1, the cluster works fine and there's no STONITH cycle. >>> >>> Now another question... Is there a way to update the UUID of each node >>> if >>> you do something crazy and move IP addresses to new interfaces, like I >>> did? >> >> >> >>> Thanks, >>> Dave >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > -- > > Dave Parker > Systems Administrator > Utica College > Integrated Information Technology Services > (315) 792-3229 > Registered Linux User #408177 > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org