Re: [Pacemaker] Pacemaker resource migration behaviour

James Guthrie Tue, 05 Feb 2013 06:17:23 -0800

Hi all,

as a follow-up to this, I realised that I needed to slightly change the way the 
resource constraints are put together, but I'm still seeing the same behaviour.


Below are an excerpt from the logs on the host and the revised xml 
configuration. In this case, I caused two failures on the host mu, which forced 
the resources onto nu then I forced two failures on nu. What can be seen in the 
logs are the two detected failures on nu (the "warning: update_failcount:" 
lines). After the two failures on nu, the VIP is migrated back to mu, but none 
of the "support" resources are promoted with it.

Regards,
James

<1c>Feb  5 14:58:45 mu crmd[31482]:  warning: update_failcount: Updating 
failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, 
time=1360072725)
<1d>Feb  5 14:58:45 mu crmd[31482]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
<1d>Feb  5 14:58:45 mu pengine[31481]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on mu: master (failed) (9)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on nu: master (failed) (9)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1d>Feb  5 14:58:45 mu pengine[31481]:   notice: LogActions: Recover 
sub-squid:0        (Master nu)
<1d>Feb  5 14:58:45 mu pengine[31481]:   notice: process_pe_message: Calculated 
Transition 64: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-152.bz2
<1d>Feb  5 14:58:45 mu pengine[31481]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on mu: master (failed) (9)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on nu: master (failed) (9)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1d>Feb  5 14:58:45 mu pengine[31481]:   notice: LogActions: Recover 
sub-squid:0        (Master nu)
<1d>Feb  5 14:58:45 mu pengine[31481]:   notice: process_pe_message: Calculated 
Transition 65: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-153.bz2
<1d>Feb  5 14:58:48 mu crmd[31482]:   notice: run_graph: Transition 65 
(Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-153.bz2): Complete
<1d>Feb  5 14:58:48 mu crmd[31482]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
<1d>Feb  5 14:58:58 mu conntrack-tools[1677]: flushing kernel conntrack table 
(scheduled)
<1c>Feb  5 14:59:10 mu crmd[31482]:  warning: update_failcount: Updating 
failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, 
time=1360072750)
<1d>Feb  5 14:59:10 mu crmd[31482]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on mu: master (failed) (9)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on nu: master (failed) (9)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
conntrackd:1       (Master -> Slave nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
condition:1        (Master -> Slave nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
sub-ospfd:1        (Master -> Slave nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  sub-ripd:1 
(Master -> Slave nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
sub-squid:0        (Master -> Stopped nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Move    
eth1-0-192.168.1.10        (Started nu -> mu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: process_pe_message: Calculated 
Transition 66: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-154.bz2
<1d>Feb  5 14:59:10 mu crmd[31482]:   notice: process_lrm_event: LRM operation 
conntrackd_notify_0 (call=996, rc=0, cib-update=0, confirmed=true) ok
<1d>Feb  5 14:59:10 mu crmd[31482]:   notice: run_graph: Transition 66 
(Complete=21, Pending=0, Fired=0, Skipped=15, Incomplete=6, 
Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-154.bz2): Stopped
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on mu: master (failed) (9)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on nu: master (failed) (9)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
conntrackd:1       (Master -> Slave nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Stop    
sub-squid:0        (nu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Start   
eth1-0-192.168.1.10        (mu)
<1d>Feb  5 14:59:10 mu pengine[31481]:   notice: process_pe_message: Calculated 
Transition 67: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-155.bz2
<1d>Feb  5 14:59:10 mu crmd[31482]:   notice: process_lrm_event: LRM operation 
conntrackd_notify_0 (call=1001, rc=0, cib-update=0, confirmed=true) ok
<1e>Feb  5 14:59:10 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: Adding inet 
address 192.168.1.10/24 with broadcast address 192.168.1.255 to device eth1 
(with label eth1:0)
<1e>Feb  5 14:59:10 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: Bringing 
device eth1 up
<1e>Feb  5 14:59:11 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: 
/opt/OSAGpcmk/resource-agents/lib/heartbeat/send_arp -i 200 -r 5 -p 
/opt/OSAGpcmk/resource-agents/var/run/resource-agents/send_arp-192.168.1.10 
eth1 192.168.1.10 auto not_used not_used
<1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM operation 
eth1-0-192.168.1.10_start_0 (call=999, rc=0, cib-update=553, confirmed=true) ok
<1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM operation 
conntrackd_notify_0 (call=1005, rc=0, cib-update=0, confirmed=true) ok
<1d>Feb  5 14:59:12 mu crmd[31482]:   notice: run_graph: Transition 67 
(Complete=20, Pending=0, Fired=0, Skipped=3, Incomplete=0, 
Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-155.bz2): Stopped
<1d>Feb  5 14:59:12 mu pengine[31481]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on mu: master (failed) (9)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
failed op monitor for sub-squid:0 on nu: master (failed) (9)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from mu after 2 failures (max=2)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
Forcing master-squid away from nu after 2 failures (max=2)
<1d>Feb  5 14:59:12 mu pengine[31481]:   notice: process_pe_message: Calculated 
Transition 68: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2
<1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM operation 
eth1-0-192.168.1.10_monitor_10000 (call=1008, rc=0, cib-update=555, 
confirmed=false) ok
<1d>Feb  5 14:59:12 mu crmd[31482]:   notice: run_graph: Transition 68 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2): Complete
<1d>Feb  5 14:59:12 mu crmd[31482]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]

<resources>
  <!--resource for conntrackd-->
  <master id="master-conntrackd">
    <meta_attributes id="master-conntrackd-meta_attributes">
      <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" 
value="true"/>
      <nvpair id="master-conntrackd-meta_attributes-interleave" 
name="interleave" value="true"/>
      <nvpair id="master-conntrackd-meta_attributes-target-role" 
name="target-role" value="Master"/>
      <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" 
name="failure-timeout" value="600"/>
      <nvpair id="master-conntrackd-meta_attributes-migration-threshold" 
name="migration-threshold" value="2"/>
    </meta_attributes>
    <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
      <operations>
        <op id="conntrackd-slave-check" name="monitor" interval="60" 
role="Slave" />
        <op id="conntrackd-master-check" name="monitor" interval="61" 
role="Master" />
      </operations>
    </primitive>
  </master>

  <!--resource for condition files-->
  <master id="master-condition">
    <meta_attributes id="master-condition-meta_attributes">
      <nvpair id="master-condition-meta_attributes-notify" name="notify" 
value="false"/>
      <nvpair id="master-condition-meta_attributes-interleave" 
name="interleave" value="true"/>
      <nvpair id="master-condition-meta_attributes-target-role" 
name="target-role" value="Master"/>
      <nvpair id="master-condition-meta_attributes-failure-timeout" 
name="failure-timeout" value="600"/>
      <nvpair id="master-condition-meta_attributes-migration-threshold" 
name="migration-threshold" value="2"/>
    </meta_attributes>
    <primitive id="condition" class="ocf" provider="OSAG" type="condition">
      <instance_attributes id="condition-attrs">
      </instance_attributes>
      <operations>
        <op id="condition-slave-check" name="monitor" interval="60" 
role="Slave" />
        <op id="condition-master-check" name="monitor" interval="61" 
role="Master" />
      </operations>
    </primitive>
  </master>

  <!--resource for subsystem ospfd-->
  <master id="master-ospfd">
    <meta_attributes id="master-ospfd-meta_attributes">
      <nvpair id="master-ospfd-meta_attributes-notify" name="notify" 
value="false"/>
      <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" 
value="true"/>
      <nvpair id="master-ospfd-meta_attributes-target-role" name="target-role" 
value="Master"/>
      <nvpair id="master-ospfd-meta_attributes-failure-timeout" 
name="failure-timeout" value="600"/>
      <nvpair id="master-ospfd-meta_attributes-migration-threshold" 
name="migration-threshold" value="2"/>
    </meta_attributes>
    <primitive id="sub-ospfd" class="ocf" provider="OSAG" type="osaginit">
      <instance_attributes id="ospfd-attrs">
        <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
      </instance_attributes>
      <operations>
        <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" />
        <op id="ospfd-master-check" name="monitor" interval="11" role="Master" 
/>
      </operations>
    </primitive>
  </master>
  <!--resource for subsystem ripd-->
  <master id="master-ripd">
    <meta_attributes id="master-ripd-meta_attributes">
      <nvpair id="master-ripd-meta_attributes-notify" name="notify" 
value="false"/>
      <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" 
value="true"/>
      <nvpair id="master-ripd-meta_attributes-target-role" name="target-role" 
value="Master"/>
      <nvpair id="master-ripd-meta_attributes-failure-timeout" 
name="failure-timeout" value="600"/>
      <nvpair id="master-ripd-meta_attributes-migration-threshold" 
name="migration-threshold" value="2"/>
    </meta_attributes>
    <primitive id="sub-ripd" class="ocf" provider="OSAG" type="osaginit">
      <instance_attributes id="ripd-attrs">
        <nvpair id="ripd-script" name="script" value="ripd.init"/>
      </instance_attributes>
      <operations>
        <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" />
        <op id="ripd-master-check" name="monitor" interval="11" role="Master" />
      </operations>
    </primitive>
  </master>
  <!--resource for subsystem squid-->
  <master id="master-squid">
    <meta_attributes id="master-squid-meta_attributes">
      <nvpair id="master-squid-meta_attributes-notify" name="notify" 
value="false"/>
      <nvpair id="master-squid-meta_attributes-interleave" name="interleave" 
value="true"/>
      <nvpair id="master-squid-meta_attributes-target-role" name="target-role" 
value="Master"/>
      <nvpair id="master-squid-meta_attributes-failure-timeout" 
name="failure-timeout" value="600"/>
      <nvpair id="master-squid-meta_attributes-migration-threshold" 
name="migration-threshold" value="2"/>
    </meta_attributes>
    <primitive id="sub-squid" class="ocf" provider="OSAG" type="osaginit">
      <instance_attributes id="squid-attrs">
        <nvpair id="squid-script" name="script" value="squid.init"/>
      </instance_attributes>
      <operations>
        <op id="squid-slave-check" name="monitor" interval="10" role="Slave" />
        <op id="squid-master-check" name="monitor" interval="11" role="Master" 
/>
      </operations>
    </primitive>
  </master>

  <!--resource for interface checks -->
  <clone id="clone-IFcheck">
    <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
      <instance_attributes id="resIFcheck-attrs">
        <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
        <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
        <nvpair id="IFcheck-dampen" name="dampen" value="16s" />
      </instance_attributes>
      <operations>
        <op id="IFcheck-monitor" interval="8s" name="monitor"/>
      </operations>
    </primitive>
  </clone>

  <!--resource for ISP checks-->
  <clone id="clone-ISPcheck">
    <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
      <instance_attributes id="ISPcheck-attrs">
        <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
        <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
        <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
        <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
      </instance_attributes>
      <operations>
        <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
      </operations>
    </primitive>
  </clone>

  <!--Virtual IP group-->
  <group id="VIP-group">
    <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" 
type="IPaddr2">
      <meta_attributes id="meta-VIP-1">
        <nvpair id="VIP-1-failure-timeout" name="failure-timeout" value="60"/>
        <nvpair id="VIP-1-migration-threshold" name="migration-threshold" 
value="50"/>
      </meta_attributes>
      <instance_attributes id="VIP-1-instance_attributes">
        <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
        <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
        <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
        <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
        <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
      </instance_attributes>
      <operations>
        <op id="VIP-1-monitor" interval="10s" name="monitor"/>
      </operations>
    </primitive>
  </group>
</resources>

<!--resource constraints-->
<constraints>
  <!--set VIP location based on the following two rules-->
  <rsc_location id="VIPs" rsc="VIP-group">
    <!--prefer host with more interfaces-->
    <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
      <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" 
operation="defined"/>
    </rule>
    <!--prefer host with better ISP connectivity-->
    <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
      <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" 
operation="defined"/>
    </rule>
  </rsc_location>

  <!--conntrack master must run where the VIPs are-->
  <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" 
with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
  <!--condition master must run where the VIPs are-->
  <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" 
with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />

  <!--ospfd master must run with master-condition in master-->
  <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd" 
with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" 
score="INFINITY" />
  <!--ripd master must run with master-condition in master-->
  <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd" 
with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" 
score="INFINITY" />
  <!--squid master must run with master-condition in master-->
  <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid" 
with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" 
score="INFINITY" />

  <!--prefer as master the following hosts in ascending order-->
  <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
  <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
  <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
</constraints>

On Feb 5, 2013, at 11:13 AM, James Guthrie <j...@open.ch> wrote:

> Hi Andrew,
> 
> "The resource" in this case was master-squid.init. The resource agent serves 
> as a master/slave OCF wrapper to a non-LSB init script. I forced the failure 
> by manually stopping that init script on the host.
> 
> Regards,
> James
> On Feb 5, 2013, at 10:56 AM, Andrew Beekhof <and...@beekhof.net> wrote:
> 
>> On Thu, Jan 31, 2013 at 3:04 AM, James Guthrie <j...@open.ch> wrote:
>>> Hi all,
>>> 
>>> I'm having a bit of difficulty with the way that my cluster is behaving on 
>>> failure of a resource.
>>> 
>>> The objective of my clustering setup is to provide a virtual IP, to which a 
>>> number of other services are bound. The services are bound to the VIP with 
>>> constraints to force the service to be running on the same host as the VIP.
>>> 
>>> I have been testing the way that the cluster behaves if it is unable to 
>>> start a resource. What I observe is the following: the cluster tries to 
>>> start the resource on node 1,
>> 
>> Can you define "the resource"?  You have a few and it matters :)
>> 
>>> fails 10 times, reaches the migration threshold, moves the resource to the 
>>> other host, fails 10 times, reaches the migration threshold. Now it has 
>>> reached the migration threshold on all possible hosts. I was then expecting 
>>> that it would stop the resource on all nodes and run all of the other 
>>> resources as though nothing were wrong. What I see though is that the 
>>> cluster demotes all master/slave resources, despite the fact that only one 
>>> of them is failing.
>>> 
>>> I wasn't able to find a parameter which would dictate what the behaviour 
>>> should be if the migration failed on all available hosts. I must therefore 
>>> believe that the constraints configuration I'm using isn't doing quite what 
>>> I hope it's doing.
>>> 
>>> Below is the configuration xml I am using on the hosts (no crmsh config, 
>>> sorry).
>>> 
>>> I am using Corosync 2.3.0 and Pacemaker 1.1.8, built from source.
>>> 
>>> Regards,
>>> James
>>> 
>>> <!-- Configuration file for pacemaker -->
>>> <resources>
>>> <!--resource for conntrackd-->
>>> <master id="master-conntrackd">
>>>   <meta_attributes id="master-conntrackd-meta_attributes">
>>>     <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" 
>>> value="true"/>
>>>     <nvpair id="master-conntrackd-meta_attributes-interleave" 
>>> name="interleave" value="true"/>
>>>     <nvpair id="master-conntrackd-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>     <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>     <nvpair id="master-conntrackd-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>   </meta_attributes>
>>>   <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
>>>     <operations>
>>>       <op id="conntrackd-slave-check" name="monitor" interval="60" 
>>> role="Slave" />
>>>       <op id="conntrackd-master-check" name="monitor" interval="61" 
>>> role="Master" />
>>>     </operations>
>>>   </primitive>
>>> </master>
>>> <master id="master-condition">
>>>   <meta_attributes id="master-condition-meta_attributes">
>>>     <nvpair id="master-condition-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>     <nvpair id="master-condition-meta_attributes-interleave" 
>>> name="interleave" value="true"/>
>>>     <nvpair id="master-condition-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>     <nvpair id="master-condition-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>     <nvpair id="master-condition-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>   </meta_attributes>
>>>   <primitive id="condition" class="ocf" provider="OSAG" type="condition">
>>>     <instance_attributes id="condition-attrs">
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="condition-slave-check" name="monitor" interval="10" 
>>> role="Slave" />
>>>       <op id="condition-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>     </operations>
>>>   </primitive>
>>> </master>
>>> <master id="master-ospfd.init">
>>>   <meta_attributes id="master-ospfd-meta_attributes">
>>>     <nvpair id="master-ospfd-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>     <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" 
>>> value="true"/>
>>>     <nvpair id="master-ospfd-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>     <nvpair id="master-ospfd-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>     <nvpair id="master-ospfd-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>   </meta_attributes>
>>>   <primitive id="ospfd" class="ocf" provider="OSAG" type="osaginit">
>>>     <instance_attributes id="ospfd-attrs">
>>>       <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" 
>>> />
>>>       <op id="ospfd-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>     </operations>
>>>   </primitive>
>>> </master>
>>> <master id="master-ripd.init">
>>>   <meta_attributes id="master-ripd-meta_attributes">
>>>     <nvpair id="master-ripd-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>     <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" 
>>> value="true"/>
>>>     <nvpair id="master-ripd-meta_attributes-target-role" name="target-role" 
>>> value="Master"/>
>>>     <nvpair id="master-ripd-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>     <nvpair id="master-ripd-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>   </meta_attributes>
>>>   <primitive id="ripd" class="ocf" provider="OSAG" type="osaginit">
>>>     <instance_attributes id="ripd-attrs">
>>>       <nvpair id="ripd-script" name="script" value="ripd.init"/>
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" />
>>>       <op id="ripd-master-check" name="monitor" interval="11" role="Master" 
>>> />
>>>     </operations>
>>>   </primitive>
>>> </master>
>>> <master id="master-squid.init">
>>>   <meta_attributes id="master-squid-meta_attributes">
>>>     <nvpair id="master-squid-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>     <nvpair id="master-squid-meta_attributes-interleave" name="interleave" 
>>> value="true"/>
>>>     <nvpair id="master-squid-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>     <nvpair id="master-squid-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>     <nvpair id="master-squid-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>   </meta_attributes>
>>>   <primitive id="squid" class="ocf" provider="OSAG" type="osaginit">
>>>     <instance_attributes id="squid-attrs">
>>>       <nvpair id="squid-script" name="script" value="squid.init"/>
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="squid-slave-check" name="monitor" interval="10" role="Slave" 
>>> />
>>>       <op id="squid-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>     </operations>
>>>   </primitive>
>>> </master>
>>> 
>>> <!--resource for interface checks -->
>>> <clone id="clone-IFcheck">
>>>   <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
>>>     <instance_attributes id="resIFcheck-attrs">
>>>       <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
>>>       <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
>>>       <nvpair id="IFcheck-dampen" name="dampen" value="6s" />
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="IFcheck-monitor" interval="3s" name="monitor"/>
>>>     </operations>
>>>   </primitive>
>>> </clone>
>>> 
>>> <!--resource for ISP checks-->
>>> <clone id="clone-ISPcheck">
>>>   <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
>>>     <instance_attributes id="ISPcheck-attrs">
>>>       <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
>>>       <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
>>>       <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
>>>       <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
>>>     </operations>
>>>   </primitive>
>>> </clone>
>>> 
>>> <!--Virtual IP group-->
>>> <group id="VIP-group">
>>>   <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" 
>>> type="IPaddr2">
>>>     <meta_attributes id="meta-VIP-1">
>>>       <nvpair id="VIP-1-failure-timeout" name="failure-timeout" value="60"/>
>>>       <nvpair id="VIP-1-migration-threshold" name="migration-threshold" 
>>> value="50"/>
>>>     </meta_attributes>
>>>     <instance_attributes id="VIP-1-instance_attributes">
>>>       <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
>>>       <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
>>>       <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
>>>       <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
>>>       <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
>>>     </instance_attributes>
>>>     <operations>
>>>       <op id="VIP-1-monitor" interval="10s" name="monitor"/>
>>>     </operations>
>>>   </primitive>
>>> </group>
>>> </resources>
>>> 
>>> <!--resource constraints-->
>>> <constraints>
>>> <!--set VIP location based on the following two rules-->
>>> <rsc_location id="VIPs" rsc="VIP-group">
>>>   <!--prefer host with more interfaces-->
>>>   <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
>>>     <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" 
>>> operation="defined"/>
>>>   </rule>
>>>   <!--prefer host with better ISP connectivity-->
>>>   <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
>>>     <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" 
>>> operation="defined"/>
>>>   </rule>
>>> </rsc_location>
>>> <!--conntrack master must run where the VIPs are-->
>>> <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>> <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>> <!--services masters must run where the VIPs are-->
>>> <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd.init" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>> <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd.init" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>> <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid.init" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>> <!--prefer as master the following hosts in ascending order-->
>>> <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
>>> <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
>>> <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
>>> </constraints>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker resource migration behaviour

Reply via email to