It was probably too late yesterday eve when I tested this.Taking a fresh look at it this morning, the ordering constraint is acting perfectly fine now even with cloned resources.
Thank you. 2014-03-22 23:41 GMT+01:00 Alexandre <alxg...@gmail.com>: > So.... it took me a while to have everything packaged and so on, but > eventually, I managed to upgrade my cluster to > corosync2/pacemaker1.1.11 (version advertised is 1.1.10-9d39a6b). > ALthough I have a much more fficient communicatio between nodes I > still have the same issue with this ordering constraint that uses > clones on both sides. > The ordering contraint works if I set a primitive as the first > resource. But if I put this primitive in a clone resource, it stops > working. > > Above are the logs I get on the node were the fist resource starts: > > Mar 22 23:29:18 sanaoe02 crmd[10989]: notice: do_state_transition: > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Mar 22 23:29:18 sanaoe02 cib[10984]: notice: cib:diff: Diff: --- 0.916.2 > Mar 22 23:29:18 sanaoe02 cib[10984]: notice: cib:diff: Diff: +++ > 0.917.1 5da74572ddb3a247189b39d515918343 > Mar 22 23:29:18 sanaoe02 cib[10984]: notice: cib:diff: -- > <nvpair value="Stopped" id="cln_aoe-meta_attributes-target-role"/> > Mar 22 23:29:18 sanaoe02 cib[10984]: notice: cib:diff: ++ > <nvpair id="cln_aoe-meta_attributes-target-role" name="target-role" > value="Started"/> > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing cln_aoe from re-starting on dir01: operation monitor failed > 'not installed' (5) > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing cln_aoe from re-starting on mta02: operation monitor failed > 'not installed' (5) > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing cln_aoe from re-starting on ms02: operation monitor failed > 'not installed' (5) > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing cln_aoe from re-starting on mx02: operation monitor failed > 'not installed' (5) > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing cln_aoe from re-starting on dir02: operation monitor failed > 'not installed' (5) > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: LogActions: Start > pri_aoe1:0#011(sanaoe02) > Mar 22 23:29:18 sanaoe02 crmd[10989]: notice: te_rsc_command: > Initiating action 39: start pri_aoe1_start_0 on sanaoe02 (local) > Mar 22 23:29:18 sanaoe02 pengine[10988]: notice: process_pe_message: > Calculated Transition 377: /var/lib/pacemaker/pengine/pe-input-100.bz2 > Mar 22 23:29:18 sanaoe02 AoEtarget(pri_aoe1)[14285]: INFO: Exporting > device /dev/xvdb on eth1 as shelf 2, slot 1 > Mar 22 23:29:18 sanaoe02 AoEtarget(pri_aoe1)[14285]: DEBUG: pri_aoe1 start : 0 > Mar 22 23:29:19 sanaoe02 crmd[10989]: notice: process_lrm_event: LRM > operation pri_aoe1_start_0 (call=194, rc=0, cib-update=982, > confirmed=true) ok > Mar 22 23:29:19 sanaoe02 crmd[10989]: notice: run_graph: Transition > 377 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-100.bz2): Complete > Mar 22 23:29:19 sanaoe02 crmd[10989]: notice: do_state_transition: > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > > On the nodes where the second resource should start, I get absoletly > no logs *at all*! > > If I modify the ordering constraint to use a primitive as the first > resource instead of a cloned resource, then everythong works ok.... > and I get the following logs on the node where the the firt resource > starts (very similar too the previous one) > > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: do_state_transition: > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Mar 22 23:37:50 sanaoe02 cib[10984]: notice: cib:diff: Diff: --- 0.920.3 > Mar 22 23:37:50 sanaoe02 cib[10984]: notice: cib:diff: Diff: +++ > 0.921.1 04b8247b3c6786c3ff15f583cf725c3d > Mar 22 23:37:50 sanaoe02 cib[10984]: notice: cib:diff: -- > <nvpair value="Stopped" id="pri_aoe1-meta_attributes-target-role"/> > Mar 22 23:37:50 sanaoe02 cib[10984]: notice: cib:diff: ++ > <nvpair id="pri_aoe1-meta_attributes-target-role" name="target-role" > value="Started"/> > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing pri_aoe1 from re-starting on dir01: operation monitor > failed 'not installed' (5) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing pri_aoe1 from re-starting on mta02: operation monitor > failed 'not installed' (5) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing pri_aoe1 from re-starting on ms02: operation monitor failed > 'not installed' (5) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing pri_aoe1 from re-starting on mx02: operation monitor failed > 'not installed' (5) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: unpack_rsc_op: > Preventing pri_aoe1 from re-starting on dir02: operation monitor > failed 'not installed' (5) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: LogActions: Start > pri_dovecot:0#011(ms02) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: LogActions: Start > pri_aoe1#011(sanaoe02) > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: te_rsc_command: > Initiating action 39: start pri_aoe1_start_0 on sanaoe02 (local) > Mar 22 23:37:50 sanaoe02 pengine[10988]: notice: process_pe_message: > Calculated Transition 381: /var/lib/pacemaker/pengine/pe-input-104.bz2 > Mar 22 23:37:50 sanaoe02 AoEtarget(pri_aoe1)[14379]: INFO: Exporting > device /dev/xvdb on eth1 as shelf 2, slot 1 > Mar 22 23:37:50 sanaoe02 AoEtarget(pri_aoe1)[14379]: DEBUG: pri_aoe1 start : 0 > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: process_lrm_event: LRM > operation pri_aoe1_start_0 (call=198, rc=0, cib-update=1027, > confirmed=true) ok > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: te_rsc_command: > Initiating action 25: start pri_dovecot_start_0 on ms02 > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: te_rsc_command: > Initiating action 26: monitor pri_dovecot_monitor_5000 on ms02 > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: run_graph: Transition > 381 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-104.bz2): Complete > Mar 22 23:37:50 sanaoe02 crmd[10989]: notice: do_state_transition: > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > > and where the second resource starts > > Mar 22 22:37:50 ms02 crmd[89496]: notice: process_lrm_event: LRM > operation pri_dovecot_start_0 (call=151, rc=0, cib-update=197, > confirmed=true) ok > Mar 22 22:37:50 ms02 dovecot: master: Dovecot v2.1.7 starting up > Mar 22 22:37:50 ms02 dovecot: master: Warning: /home is no longer > mounted. If this is intentional, remove it with doveadm mount > Mar 22 22:37:50 ms02 crmd[89496]: notice: process_lrm_event: LRM > operation pri_dovecot_monitor_5000 (call=152, rc=0, cib-update=198, > confirmed=false) ok > > I can't find anything usefull in those logs but if you think something > is relevant or could be, please feel free to highlight. > > 2014-03-11 2:13 GMT+01:00 Andrew Beekhof <and...@beekhof.net>: >> >> On 9 Mar 2014, at 10:36 pm, Alexandre <alxg...@gmail.com> wrote: >> >>> So..., >>> >>> It appears the problem doesn't come from the primitive but for the >>> cloned resource. If I use the primitive instead of the clone in the >>> order constraint (thus deleting the clone and the group) , the second >>> resource of the constraint startup as expected. >>> >>> Any idea why? >> >> Not without logs >> >>> >>> Should I upgrade this pretty old version of pacemaker? >> >> Yes :) >> >>> >>> 2014-03-08 10:36 GMT+01:00 Alexandre <alxg...@gmail.com>: >>>> Hi Andrew, >>>> >>>> I have tried to stop and start the first resource of the ordering >>>> constraint (cln_san), hoping it would trigger a start attemps of the >>>> second resource of the ordering constraint (cln_mailstore). >>>> I tailed the syslog logs on the node where I was expecting the second >>>> resource to start but really nothing appeared in those logs (I grepped >>>> 'pengine as per your suggestion). >>>> >>>> I have done another test, where I changed the first resource of the >>>> ordering constraint with a very simple primitive (lsb resource), and >>>> it worked in this case. >>>> >>>> I am wondering if the issue doesn't come from the rather complicated >>>> first resource. It is a cloned group which contains a primitive >>>> conditional instance attributes... >>>> Are you aware of any specific issue in pacemaker 1.1.7 with this kind >>>> of ressources? >>>> >>>> I will try to simplify the resources by getting rid of the conditional >>>> instance attribute and try again. In the mean time I'd be delighted to >>>> hear about what you guys think about that. >>>> >>>> Regards, Alex. >>>> >>>> 2014-03-07 4:21 GMT+01:00 Andrew Beekhof <and...@beekhof.net>: >>>>> >>>>> On 3 Mar 2014, at 3:56 am, Alexandre <alxg...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am setting up a cluster on debian wheezy. >>>>>> I have installed pacemaker using the debian provided packages (so am >>>>>> runing 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff). >>>>>> >>>>>> I have roughly 10 nodes, among which some nodes are acting as SAN >>>>>> (exporting block devices using AoE protocol) and others nodes acting >>>>>> as initiators (they are actually mail servers, storing emails on the >>>>>> exported devices). >>>>>> Bellow are the defined resources for those nodes: >>>>>> >>>>>> xml <primitive class="ocf" id="pri_aoe1" provider="heartbeat" >>>>>> type="AoEtarget"> \ >>>>>> <instance_attributes id="pri_aoe1.1-instance_attributes"> \ >>>>>> <rule id="node-sanaoe01" score="1"> \ >>>>>> <expression attribute="#uname" >>>>>> id="expr-node-sanaoe01" operation="eq" value="sanaoe01"/> \ >>>>>> </rule> \ >>>>>> <nvpair id="pri_aoe1.1-instance_attributes-device" >>>>>> name="device" value="/dev/xvdb"/> \ >>>>>> <nvpair id="pri_aoe1.1-instance_attributes-nic" >>>>>> name="nic" value="eth0"/> \ >>>>>> <nvpair id="pri_aoe1.1-instance_attributes-shelf" >>>>>> name="shelf" value="1"/> \ >>>>>> <nvpair id="pri_aoe1.1-instance_attributes-slot" >>>>>> name="slot" value="1"/> \ >>>>>> </instance_attributes> \ >>>>>> <instance_attributes id="pri_aoe2.1-instance_attributes"> \ >>>>>> <rule id="node-sanaoe02" score="2"> \ >>>>>> <expression attribute="#uname" >>>>>> id="expr-node-sanaoe2" operation="eq" value="sanaoe02"/> \ >>>>>> </rule> \ >>>>>> <nvpair id="pri_aoe2.1-instance_attributes-device" >>>>>> name="device" value="/dev/xvdb"/> \ >>>>>> <nvpair id="pri_aoe2.1-instance_attributes-nic" >>>>>> name="nic" value="eth1"/> \ >>>>>> <nvpair id="pri_aoe2.1-instance_attributes-shelf" >>>>>> name="shelf" value="2"/> \ >>>>>> <nvpair id="pri_aoe2.1-instance_attributes-slot" >>>>>> name="slot" value="1"/> \ >>>>>> </instance_attributes> \ >>>>>> </primitive> >>>>>> primitive pri_dovecot lsb:dovecot \ >>>>>> op start interval="0" timeout="20" \ >>>>>> op stop interval="0" timeout="30" \ >>>>>> op monitor interval="5" timeout="10" >>>>>> primitive pri_spamassassin lsb:spamassassin \ >>>>>> op start interval="0" timeout="50" \ >>>>>> op stop interval="0" timeout="60" \ >>>>>> op monitor interval="5" timeout="20" >>>>>> group grp_aoe pri_aoe1 >>>>>> group grp_mailstore pri_dlm pri_clvmd pri_spamassassin pri_dovecot >>>>>> clone cln_mailstore grp_mailstore \ >>>>>> meta ordered="false" interleave="true" clone-max="2" >>>>>> clone cln_san grp_aoe \ >>>>>> meta ordered="true" interleave="true" clone-max="2" >>>>>> >>>>>> As I am in an "opt-in cluster" mode (symmetric-cluster="false"), I >>>>>> have the location constraints bellow for those hosts: >>>>>> >>>>>> location LOC_AOE_ETHERD_1 cln_san inf: sanaoe01 >>>>>> location LOC_AOE_ETHERD_2 cln_san inf: sanaoe02 >>>>>> location LOC_MAIL_STORE_1 cln_mailstore inf: ms01 >>>>>> location LOC_MAIL_STORE_2 cln_mailstore inf: ms02 >>>>>> >>>>>> So far so good. I want to make sure the initiators won't try to search >>>>>> for exported devices before the targets actually exported them. To do >>>>>> so, I though I could use the following ordering constraint: >>>>>> >>>>>> order ORD_SAN_MAILSTORE inf: cln_san cln_mailstore >>>>>> >>>>>> Unfortunately if I add this constraint the clone Set "cln_mailstore" >>>>>> never starts (or even stops if started when I add the constraint). >>>>>> >>>>>> Is there something wrong with this ordering rule? >>>>>> Where can i find informations on what's going on? >>>>> >>>>> No errors in the logs? >>>>> If you grep for 'pengine' does it want to start them or just leave them >>>>> stopped? >>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org