On 22 Jan 2014, at 10:54 am, Brian J. Murrell (brian) <br...@interlinx.bc.ca> wrote:
> On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote: >> >> What crm_mon are you looking at? >> I see stuff like: >> >> virt-fencing (stonith:fence_xvm): Started rhos4-node3 >> Resource Group: mysql-group >> mysql-vip (ocf::heartbeat:IPaddr2): Started rhos4-node3 >> mysql-fs (ocf::heartbeat:Filesystem): Started rhos4-node3 >> mysql-db (ocf::heartbeat:mysql): Started rhos4-node3 > > Yes, you are right. I couldn't see the forest for the trees. > > I initially was optimistic about crm_mon being more truthful than > crm_resource but it turns out it is not. It can't be, they're both obtaining their data from the same place (the cib). > > Take for example these commands to set a constraint and start a resource > (which has already been defined at this point): > > [21/Jan/2014:13:46:40] cibadmin -o constraints -C -X '<rsc_location > id="res1-primary" node="node5" rsc="res1" score="20"/>' > [21/Jan/2014:13:46:41] cibadmin -o constraints -C -X '<rsc_location > id="res1-secondary" node="node6" rsc="res1" score="10"/>' > [21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started' > > and then these repeated calls to crm_mon -1 on node5: > > [21/Jan/2014:13:46:42] crm_mon -1 > Last updated: Tue Jan 21 13:46:42 2014 > Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5 > Stack: openais > Current DC: node5 - partition with quorum > Version: 1.1.10-14.el6_5.1-368c726 > 2 Nodes configured > 2 Resources configured > > > Online: [ node5 node6 ] > > st-fencing (stonith:fence_product): Started node5 > res1 (ocf::product:Target): Started node6 > > [21/Jan/2014:13:46:42] crm_mon -1 > Last updated: Tue Jan 21 13:46:42 2014 > Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5 > Stack: openais > Current DC: node5 - partition with quorum > Version: 1.1.10-14.el6_5.1-368c726 > 2 Nodes configured > 2 Resources configured > > > Online: [ node5 node6 ] > > st-fencing (stonith:fence_product): Started node5 > res1 (ocf::product:Target): Started node6 > > [21/Jan/2014:13:46:49] crm_mon -1 -r > Last updated: Tue Jan 21 13:46:49 2014 > Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5 > Stack: openais > Current DC: node5 - partition with quorum > Version: 1.1.10-14.el6_5.1-368c726 > 2 Nodes configured > 2 Resources configured > > > Online: [ node5 node6 ] > > Full list of resources: > > st-fencing (stonith:fence_product): Started node5 > res1 (ocf::product:Target): Started node5 > > The first two are not correct, showing the resource started on node6 > when it was actually started on node5. Was it running there to begin with? Answering my own question... yes. It was: > Jan 21 13:46:41 node5 crmd[8695]: warning: status_from_rc: Action 6 > (res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error and then we try to stop it: > Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action > 7: stop res1_stop_0 on node6 So you are correct that something is wrong, but it isn't pacemaker. > Finally, 7 seconds later, it is > reporting correctly. The logs on node{5,6} bear this out. The resource > was actually only ever started on node5 and never on node6. Wrong.
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org