[Pacemaker] Utilization & resource stickiness strange behaviour - sorted resources

agutxi Agustin Thu, 19 Jan 2012 04:42:37 -0800

Hi all,
I am trying to set up a cluster of virtual machine hosts, and while
doing so, I came out with a very strange behaviour (I think it may be
a bug) and I hope you can lend me a hand in debugging this.
For testing the behaviour observed in my production environment, I set
up 2 new simple machines with no location/colocation/order
constraints, and changed the Xen resource agent with the dummy
resource agent, and the behaviour was the same.


The scenario is the following:
- The strategy is "utilization".
- 2 nodes: vmHost1 and vmHost2, with 2 cores each handle 5 resources:
DummyVM001-005, with resource-stickiness="INFINITY".

What happens is the following:
- If I start resources DummyVM001-004, everything is fine. 2 resources
run on each of the nodes
- Now, I start DummyVM005, but utilization is full, so it does not
start (cool :)
- Then , if I stop any of the running resources, everything goes
smoothly and DummyVM005 starts up. That's cool too.
* Here comes the strange part:
I have full utilization and resource stickines INFINITY, so starting
new resources shouldn't change anything in the cluster status BUT
If any of the freshly restarted resources is alphabetically sorted
before, the cluster stops the "last" resource alphabetically sorted
running  and starts the stopped one.
I don't think this is the expected behaviour, please correct me if I wrong.

Thank you kindly,
Agustin

status:
====
Online: [ vmHost1 vmHost2 ]

 DummyVM1       (ocf::pacemaker:Dummy): Started vmHost1
 DummyVM2       (ocf::pacemaker:Dummy): Started vmHost1
 DummyVM3       (ocf::pacemaker:Dummy): Started vmHost2
 DummyVM5       (ocf::pacemaker:Dummy): Started vmHost2
crm(live)# resource start DummyVM4

My configuration:
============
crm(live)# configure show
node vmHost1 \
        utilization cores="2"
node vmHost2 \
        utilization cores="2"
primitive DummyVM1 ocf:pacemaker:Dummy \
        op monitor interval="60s" timeout="60s" \
        op start on-fail="restart" interval="0" \
        op stop on-fail="ignore" interval="0" \
        utilization cores="1" \
        meta is-managed="true" migration-threshold="2"
primitive DummyVM2 ocf:pacemaker:Dummy \
        op monitor interval="60s" timeout="60s" \
        op start on-fail="restart" interval="0" \
        op stop on-fail="ignore" interval="0" \
        utilization cores="1" \
        meta is-managed="true" migration-threshold="2"
primitive DummyVM3 ocf:pacemaker:Dummy \
        op monitor interval="60s" timeout="60s" \
        op start on-fail="restart" interval="0" \
        op stop on-fail="ignore" interval="0" \
        utilization cores="1" \
        meta is-managed="true" migration-threshold="2"
primitive DummyVM4 ocf:pacemaker:Dummy \
        op monitor interval="60s" timeout="60s" \
        op start on-fail="restart" interval="0" \
        op stop on-fail="ignore" interval="0" \
        utilization cores="1" \
        meta is-managed="true" migration-threshold="2" target-role="Started"
primitive DummyVM5 ocf:pacemaker:Dummy \
        op monitor interval="60s" timeout="60s" \
        op start on-fail="restart" interval="0" \
        op stop on-fail="ignore" interval="0" \
        utilization cores="1" \
        meta is-managed="true" migration-threshold="2" target-role="Started"
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        stop-all-resources="false" \
        placement-strategy="utilization" \
        no-quorum-policy="ignore" \
        cluster-infrastructure="openais" \
        stop-orphan-resources="true" \
        stop-orphan-actions="true" \
        last-lrm-refresh="1326975274"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY"


and the important part in /var/log/syslog :
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <cib
admin_epoch="0" epoch="75" num_updates="4" >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-   <configuration >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-     <resources >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-       <primitive
id="DummyVM4" >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-
<meta_attributes id="DummyVM4-meta_attributes" >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-           <nvpair
value="Stopped" id="DummyVM4-meta_attributes-target-role" />
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-         </meta_attributes>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-       </primitive>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-     </resources>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-   </configuration>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </cib>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <cib epoch="76"
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
crm_feature_set="3.0.5" have-quorum="1" cib-last-written="Thu Jan 19
12:35:47 2012" dc-uuid="vmHost1" >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+   <configuration >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+     <resources >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+       <primitive
class="ocf" id="DummyVM4" provider="pacemaker" type="Dummy" >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+
<meta_attributes id="DummyVM4-meta_attributes" >
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+           <nvpair
id="DummyVM4-meta_attributes-target-role" name="target-role"
value="Started" />
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+         </meta_attributes>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+       </primitive>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+     </resources>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+   </configuration>
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </cib>
Jan 19 13:36:19 vmHost1 crmd: [729]: info: abort_transition_graph:
te_update_diff:131 - Triggered transition abort (complete=1, tag=diff,
id=(null), magic=NA, cib=0.76.1) : Non-status change
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_pe_invoke: Query 191:
Requesting the current CIB: S_POLICY_ENGINE
Jan 19 13:36:19 vmHost1 cib: [725]: info: cib_process_request:
Operation complete: op cib_replace for section resources
(origin=local/cibadmin/2, version=0.76.1): ok (rc=0)
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_pe_invoke_callback:
Invoking the PE: query=191, ref=pe_calc-dc-1326976579-119, seq=64,
quorate=1
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
DummyVM1#011(ocf::pacemaker:Dummy):#011Started vmHost1
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
DummyVM2#011(ocf::pacemaker:Dummy):#011Started vmHost1
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
DummyVM3#011(ocf::pacemaker:Dummy):#011Started vmHost2
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
DummyVM4#011(ocf::pacemaker:Dummy):#011Stopped
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
DummyVM5#011(ocf::pacemaker:Dummy):#011Started vmHost2
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: RecurringOp:  Start
recurring monitor (60s) for DummyVM4 on vmHost2
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave
DummyVM1#011(Started vmHost1)
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave
DummyVM2#011(Started vmHost1)
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave
DummyVM3#011(Started vmHost2)
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Start
DummyVM4#011(vmHost2)
Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Stop
DummyVM5#011(vmHost2)
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 19 13:36:19 vmHost1 crmd: [729]: info: unpack_graph: Unpacked
transition 33: 6 actions in 6 synapses
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_te_invoke: Processing
graph 33 (ref=pe_calc-dc-1326976579-119) derived from
/var/lib/pengine/pe-input-717.bz2
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating
action 19: stop DummyVM5_stop_0 on vmHost2
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo
action 6 fired and confirmed
Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action
DummyVM5_stop_0 (19) confirmed on vmHost2 (rc=0)
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo
action 7 fired and confirmed
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo
action 5 fired and confirmed
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating
action 17: start DummyVM4_start_0 on vmHost2
Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action
DummyVM4_start_0 (17) confirmed on vmHost2 (rc=0)
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating
action 18: monitor DummyVM4_monitor_60000 on vmHost2
Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action
DummyVM4_monitor_60000 (18) confirmed on vmHost2 (rc=0)
Jan 19 13:36:19 vmHost1 crmd: [729]: info: run_graph:
====================================================
Jan 19 13:36:19 vmHost1 crmd: [729]: notice: run_graph: Transition 33
(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-717.bz2): Complete
Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_graph_trigger:
Transition 33 is now complete
Jan 19 13:36:19 vmHost1 crmd: [729]: info: notify_crmd: Transition 33
status: done - <null>
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition:
Starting PEngine Recheck Timer




-- 
"Death: Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. "
Terry Pratcher - Hogfather

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Utilization & resource stickiness strange behaviour - sorted resources

Reply via email to