Hi All, We found a problem at the time of Porobe error.
It is the following simple resource constitution. ============ Last updated: Wed Aug 22 15:19:50 2012 Stack: Heartbeat Current DC: drbd1 (6081ac99-d941-40b9-a4a3-9f996ff291c0) - partition with quorum Version: 1.0.12-c6770b8 1 Nodes configured, unknown expected votes 1 Resources configured. ============ Online: [ drbd1 ] Resource Group: grpTest resource1 (ocf::pacemaker:Dummy): Started drbd1 resource2 (ocf::pacemaker:Dummy): Started drbd1 resource3 (ocf::pacemaker:Dummy): Started drbd1 resource4 (ocf::pacemaker:Dummy): Started drbd1 Node Attributes: * Node drbd1: Migration summary: * Node drbd1: Depending on the resource that the Probe error occurs, the stop of the resource does not become the inverse order. I confirmed it in the next procedure. Step 1) Make resource2 and resource4 a starting state. [root@drbd1 ~]# touch /var/run/Dummy-resource2.state [root@drbd1 ~]# touch /var/run/Dummy-resource4.state Step 2) Start a node and send cib. Step 3) Resource2 and resource3 stop, but are not inverse order. (snip) Aug 22 15:19:47 drbd1 pengine: [32722]: notice: group_print: Resource Group: grpTest Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print: resource1#011(ocf::pacemaker:Dummy):#011Stopped Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print: resource2#011(ocf::pacemaker:Dummy):#011Started drbd1 Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print: resource3#011(ocf::pacemaker:Dummy):#011Stopped Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print: resource4#011(ocf::pacemaker:Dummy):#011Started drbd1 (snip) Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating action 6: stop resource2_stop_0 on drbd1 (local) Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing key=6:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource2_stop_0 ) Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource2 stop[6] (pid 32745) Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating action 11: stop resource4_stop_0 on drbd1 (local) Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing key=11:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource4_stop_0 ) Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource4 stop[7] (pid 32746) Aug 22 15:19:47 drbd1 lrmd: [32716]: info: operation stop[6] on resource2 for client 32719: pid 32745 exited with return code 0 (snip) I know that there is a cause of this stop order for order in group. In this case our user wants to stop a resource in inverse order definitely. * resource4_stop -> resource2_stop Stop order is important to the resource of our user. I ask next question. Question 1) Is there right setting in cib.xml to evade this problem? Question 2) In Pacemaker1.1, does this problem occur? Question 3) I added following order. <rsc_order id="order-2" first="resource1" then="resource3" /> <rsc_order id="order-3" first="resource1" then="resource4" /> <rsc_order id="order-5" first="resource2" then="resource4" /> And the addition of this order seems to solve a problem. Is the addition of order right as one method of the solution, too? Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org