Re: [Pacemaker] Pacemaker resource migration behaviour

Andrew Beekhof Tue, 05 Mar 2013 22:33:06 -0800

On Tue, Feb 5, 2013 at 9:13 PM, James Guthrie <j...@open.ch> wrote:
> Hi Andrew,
>
> "The resource" in this case was master-squid.init. The resource agent serves 
> as a master/slave OCF wrapper to a non-LSB init script. I forced the failure 
> by manually stopping that init script on the host.


Ok.

Generally init scripts aren't suitable to be used as a master/slave
resource - even when wrapped in an OCF script.
What do you do for promote/demote?

Beyond that, are you saying that resources other than
master-squid.init were stopped?  That sounds very bad.

>
> Regards,
> James
> On Feb 5, 2013, at 10:56 AM, Andrew Beekhof <and...@beekhof.net> wrote:
>
>> On Thu, Jan 31, 2013 at 3:04 AM, James Guthrie <j...@open.ch> wrote:
>>> Hi all,
>>>
>>> I'm having a bit of difficulty with the way that my cluster is behaving on 
>>> failure of a resource.
>>>
>>> The objective of my clustering setup is to provide a virtual IP, to which a 
>>> number of other services are bound. The services are bound to the VIP with 
>>> constraints to force the service to be running on the same host as the VIP.
>>>
>>> I have been testing the way that the cluster behaves if it is unable to 
>>> start a resource. What I observe is the following: the cluster tries to 
>>> start the resource on node 1,
>>
>> Can you define "the resource"?  You have a few and it matters :)
>>
>>> fails 10 times, reaches the migration threshold, moves the resource to the 
>>> other host, fails 10 times, reaches the migration threshold. Now it has 
>>> reached the migration threshold on all possible hosts. I was then expecting 
>>> that it would stop the resource on all nodes and run all of the other 
>>> resources as though nothing were wrong. What I see though is that the 
>>> cluster demotes all master/slave resources, despite the fact that only one 
>>> of them is failing.
>>>
>>> I wasn't able to find a parameter which would dictate what the behaviour 
>>> should be if the migration failed on all available hosts. I must therefore 
>>> believe that the constraints configuration I'm using isn't doing quite what 
>>> I hope it's doing.
>>>
>>> Below is the configuration xml I am using on the hosts (no crmsh config, 
>>> sorry).
>>>
>>> I am using Corosync 2.3.0 and Pacemaker 1.1.8, built from source.
>>>
>>> Regards,
>>> James
>>>
>>> <!-- Configuration file for pacemaker -->
>>> <resources>
>>>  <!--resource for conntrackd-->
>>>  <master id="master-conntrackd">
>>>    <meta_attributes id="master-conntrackd-meta_attributes">
>>>      <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" 
>>> value="true"/>
>>>      <nvpair id="master-conntrackd-meta_attributes-interleave" 
>>> name="interleave" value="true"/>
>>>      <nvpair id="master-conntrackd-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>      <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>      <nvpair id="master-conntrackd-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>    </meta_attributes>
>>>    <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
>>>      <operations>
>>>        <op id="conntrackd-slave-check" name="monitor" interval="60" 
>>> role="Slave" />
>>>        <op id="conntrackd-master-check" name="monitor" interval="61" 
>>> role="Master" />
>>>      </operations>
>>>    </primitive>
>>>  </master>
>>>  <master id="master-condition">
>>>    <meta_attributes id="master-condition-meta_attributes">
>>>      <nvpair id="master-condition-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>      <nvpair id="master-condition-meta_attributes-interleave" 
>>> name="interleave" value="true"/>
>>>      <nvpair id="master-condition-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>      <nvpair id="master-condition-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>      <nvpair id="master-condition-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>    </meta_attributes>
>>>    <primitive id="condition" class="ocf" provider="OSAG" type="condition">
>>>      <instance_attributes id="condition-attrs">
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="condition-slave-check" name="monitor" interval="10" 
>>> role="Slave" />
>>>        <op id="condition-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>      </operations>
>>>    </primitive>
>>>  </master>
>>>  <master id="master-ospfd.init">
>>>    <meta_attributes id="master-ospfd-meta_attributes">
>>>      <nvpair id="master-ospfd-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>      <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" 
>>> value="true"/>
>>>      <nvpair id="master-ospfd-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>      <nvpair id="master-ospfd-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>      <nvpair id="master-ospfd-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>    </meta_attributes>
>>>    <primitive id="ospfd" class="ocf" provider="OSAG" type="osaginit">
>>>      <instance_attributes id="ospfd-attrs">
>>>        <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" 
>>> />
>>>        <op id="ospfd-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>      </operations>
>>>    </primitive>
>>>  </master>
>>>  <master id="master-ripd.init">
>>>    <meta_attributes id="master-ripd-meta_attributes">
>>>      <nvpair id="master-ripd-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>      <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" 
>>> value="true"/>
>>>      <nvpair id="master-ripd-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>      <nvpair id="master-ripd-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>      <nvpair id="master-ripd-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>    </meta_attributes>
>>>    <primitive id="ripd" class="ocf" provider="OSAG" type="osaginit">
>>>      <instance_attributes id="ripd-attrs">
>>>        <nvpair id="ripd-script" name="script" value="ripd.init"/>
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" 
>>> />
>>>        <op id="ripd-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>      </operations>
>>>    </primitive>
>>>  </master>
>>>  <master id="master-squid.init">
>>>    <meta_attributes id="master-squid-meta_attributes">
>>>      <nvpair id="master-squid-meta_attributes-notify" name="notify" 
>>> value="false"/>
>>>      <nvpair id="master-squid-meta_attributes-interleave" name="interleave" 
>>> value="true"/>
>>>      <nvpair id="master-squid-meta_attributes-target-role" 
>>> name="target-role" value="Master"/>
>>>      <nvpair id="master-squid-meta_attributes-failure-timeout" 
>>> name="failure-timeout" value="600"/>
>>>      <nvpair id="master-squid-meta_attributes-migration-threshold" 
>>> name="migration-threshold" value="10"/>
>>>    </meta_attributes>
>>>    <primitive id="squid" class="ocf" provider="OSAG" type="osaginit">
>>>      <instance_attributes id="squid-attrs">
>>>        <nvpair id="squid-script" name="script" value="squid.init"/>
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="squid-slave-check" name="monitor" interval="10" role="Slave" 
>>> />
>>>        <op id="squid-master-check" name="monitor" interval="11" 
>>> role="Master" />
>>>      </operations>
>>>    </primitive>
>>>  </master>
>>>
>>>  <!--resource for interface checks -->
>>>  <clone id="clone-IFcheck">
>>>    <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
>>>      <instance_attributes id="resIFcheck-attrs">
>>>        <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
>>>        <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
>>>        <nvpair id="IFcheck-dampen" name="dampen" value="6s" />
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="IFcheck-monitor" interval="3s" name="monitor"/>
>>>      </operations>
>>>    </primitive>
>>>  </clone>
>>>
>>>  <!--resource for ISP checks-->
>>>  <clone id="clone-ISPcheck">
>>>    <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
>>>      <instance_attributes id="ISPcheck-attrs">
>>>        <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
>>>        <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
>>>        <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
>>>        <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
>>>      </operations>
>>>    </primitive>
>>>  </clone>
>>>
>>>  <!--Virtual IP group-->
>>>  <group id="VIP-group">
>>>    <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" 
>>> type="IPaddr2">
>>>      <meta_attributes id="meta-VIP-1">
>>>        <nvpair id="VIP-1-failure-timeout" name="failure-timeout" 
>>> value="60"/>
>>>        <nvpair id="VIP-1-migration-threshold" name="migration-threshold" 
>>> value="50"/>
>>>      </meta_attributes>
>>>      <instance_attributes id="VIP-1-instance_attributes">
>>>        <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
>>>        <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
>>>        <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
>>>        <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
>>>        <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
>>>      </instance_attributes>
>>>      <operations>
>>>        <op id="VIP-1-monitor" interval="10s" name="monitor"/>
>>>      </operations>
>>>    </primitive>
>>>  </group>
>>> </resources>
>>>
>>> <!--resource constraints-->
>>> <constraints>
>>>  <!--set VIP location based on the following two rules-->
>>>  <rsc_location id="VIPs" rsc="VIP-group">
>>>    <!--prefer host with more interfaces-->
>>>    <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
>>>      <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" 
>>> operation="defined"/>
>>>    </rule>
>>>    <!--prefer host with better ISP connectivity-->
>>>    <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
>>>      <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" 
>>> operation="defined"/>
>>>    </rule>
>>>  </rsc_location>
>>>  <!--conntrack master must run where the VIPs are-->
>>>  <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>  <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>  <!--services masters must run where the VIPs are-->
>>>  <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd.init" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>  <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd.init" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>  <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid.init" 
>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>  <!--prefer as master the following hosts in ascending order-->
>>>  <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
>>>  <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
>>>  <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
>>> </constraints>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker resource migration behaviour

Reply via email to