RE: [Linux-HA] Failover not working as I expected

Jerome Yanga Tue, 27 Jan 2009 13:04:34 -0800

Dominik,

Here is the status of the two concerns I needed help on.


01)  When a node comes back up after a restart of heartbeat, resources gets 
bounced when it rejoins the cluster.
STATUS:  The resources still gets bounced when a node joins the cluster even if 
I had deleted all the constraints.

02)  Stopping one resource in a group does not failover the group to the other 
node.
STATUS:  migration-threshold works like a charm.  :)  Thanks.

If I may, I have another concern that popped up.

03)  I cannot seem to get MailTo to work.  I am trying to add this resource 
under the Directory_Server group so that everytime a failover is experienced, 
it will notify me that it did.

Below is the current cib.xml file I have.

<cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0" 
have-quorum="1" dc-uuid="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" epoch="99" 
num_updates="0" cib-last-written="Tue Jan 27 12:59:21 2009">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com" 
type="normal">
        <instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e">
          <nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" 
name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" uname="rubric.esri.com" 
type="normal">
        <instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c">
          <nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c" 
name="standby" value="off"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <group id="Directory_Server">
        <meta_attributes id="Directory_Server-meta_attributes">
          <nvpair id="Directory_Server-meta_attributes-collocated" 
name="collocated" value="true"/>
          <nvpair id="Directory_Server-meta_attributes-ordered" name="ordered" 
value="true"/>
          <nvpair id="Directory_Server-meta_attributes-migration-threshold" 
name="migration-threshold" value="1"/>
          <nvpair id="Directory_Server-meta_attributes-failure-timeout" 
name="failure-timeout" value="10s"/>
        </meta_attributes>
        <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
          <instance_attributes id="VIP-instance_attributes">
            <nvpair id="VIP-instance_attributes-ip" name="ip" 
value="10.50.26.250"/>
          </instance_attributes>
          <operations id="VIP-ops">
            <op id="VIP-monitor-5s" interval="5s" name="monitor" timeout="5s"/>
          </operations>
        </primitive>
        <primitive class="ocf" id="ECAS" provider="esri" type="ecas">
          <operations id="ECAS-ops">
            <op id="ECAS-monitor-3s" interval="3s" name="monitor" timeout="3s"/>
          </operations>
        </primitive>
        <primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm">
          <operations id="FDS_Admin-ops">
            <op id="FDS_Admin-monitor-3s" interval="3s" name="monitor" 
timeout="3s"/>
          </operations>
        </primitive>
        <primitive class="ocf" provider="heartbeat" type="MailTo" 
id="Emergency_Contact">
          <instance_attributes id="Emergency_Contact-instance_attributes">
            <nvpair id="Emergency_Contact-instance_attributes-email" 
name="email" value="[email protected]"/>
            <nvpair id="Emergency_Contact-instance_attributes-subject" 
name="subject" value="Failover Occured"/>
          </instance_attributes>
          <operations id="Emergency_Contact-ops">
            <op interval="3s" name="monitor" timeout="3s" 
id="Emergency_Contact-monitor-3s"/>
          </operations>
        </primitive>
      </group>
    </resources>
    <constraints/>
  </configuration>
</cib>

Help.

Regards,
jerome

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Dominik Klein
Sent: Monday, January 26, 2009 10:52 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected

Jerome Yanga wrote:
> Andrew,
> 
> I apologize for my sending my previous email abruptly.
> 
> I have followed your recommendation and installed Pacemaker.
> 
> Here is my config.
> 
> Packages Installed:
> heartbeat-2.99.2-6.1
> heartbeat-common-2.99.2-6.1
> heartbeat-debug-2.99.2-6.1
> heartbeat-ldirectord-2.99.2-6.1
> heartbeat-resources-2.99.2-6.1
> libheartbeat2-2.99.2-6.1
> libpacemaker3-1.0.1-3.1
> pacemaker-1.0.1-3.1
> pacemaker-debug-1.0.1-3.1
> pacemaker-pygui-1.4-11.9
> pacemaker-pygui-debug-1.4-11.9
> 
> 
> 
> ha.cf:
> # Logging
> debug                         1
> use_logd                      false
> logfacility                   daemon
> 
> # Misc Options
> traditional_compression       off
> compression                   bz2
> coredumps                     true
> 
> # Communications
> udpport                       691
> bcast                         eth1 eth0
> autojoin                      any
>   
> # Thresholds (in seconds)
> keepalive                     1
> warntime                      6
> deadtime                      10
> initdead                      15
> 
> ping 10.50.254.254
> crm respawn
>  apiauth      mgmtd   uid=root
>  respawn      root    /usr/lib/heartbeat/mgmtd -v
> 
> 
> cib.xml:
> <cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0" 
> have-quorum="1" epoch="57" dc-uuid="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" 
> num_updates="0" cib-last-written="Mon Jan 26 13:57:32 2009">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
> value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com" 
> type="normal">
>         <instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e">
>           <nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" 
> name="standby" value="off"/>
>         </instance_attributes>
>       </node>
>       <node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" uname="rubric.esri.com" 
> type="normal">
>         <instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c">
>           <nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c" 
> name="standby" value="off"/>
>         </instance_attributes>
>       </node>
>     </nodes>
>     <resources>
>       <group id="Directory_Server">
>         <meta_attributes id="Directory_Server-meta_attributes">
>           <nvpair id="Directory_Server-meta_attributes-collocated" 
> name="collocated" value="true"/>
>           <nvpair id="Directory_Server-meta_attributes-ordered" 
> name="ordered" value="true"/>
>           <nvpair id="Directory_Server-meta_attributes-resource_stickiness" 
> name="resource_stickiness" value="100"/>
>         </meta_attributes>
>         <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
>           <instance_attributes id="VIP-instance_attributes">
>             <nvpair id="VIP-instance_attributes-ip" name="ip" 
> value="10.50.26.250"/>
>           </instance_attributes>
>           <operations id="VIP-ops">
>             <op id="VIP-monitor-5s" interval="5s" name="monitor" 
> timeout="5s"/>
>           </operations>
>         </primitive>
>         <primitive class="ocf" id="ECAS" provider="esri" type="ecas">
>           <operations id="ECAS-ops">
>             <op id="ECAS-monitor-3s" interval="3s" name="monitor" 
> timeout="3s"/>
>           </operations>
>           <meta_attributes id="ECAS-meta_attributes">
>             <nvpair id="ECAS-meta_attributes-target-role" name="target-role" 
> value="Started"/>
>           </meta_attributes>
>         </primitive>
>         <primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm">
>           <operations id="FDS_Admin-ops">
>             <op id="FDS_Admin-monitor-3s" interval="3s" name="monitor" 
> timeout="3s"/>
>           </operations>
>         </primitive>
>       </group>
>     </resources>
>     <constraints>
>       <rsc_location id="cli-prefer-Directory_Server" rsc="Directory_Server">
>         <rule id="cli-prefer-rule-Directory_Server" score="INFINITY" 
> boolean-op="and">
>           <expression id="cli-prefer-expr-Directory_Server" 
> attribute="#uname" operation="eq" value="rubric.esri.com" type="string"/>
>         </rule>
>       </rsc_location>
>       <rsc_location id="cli-prefer-FDS_Admin" rsc="FDS_Admin">
>         <rule id="cli-prefer-rule-FDS_Admin" score="INFINITY" 
> boolean-op="and">
>           <expression id="cli-prefer-expr-FDS_Admin" attribute="#uname" 
> operation="eq" value="nomen.esri.com" type="string"/>
>         </rule>
>       </rsc_location>
>     </constraints>
>   </configuration>
> </cib>
> 
> 
> 
> I still have the following issues when I only had heartbeat 2.1.3-1.  My 
> concerns are still as follows:
> 
> 01)  When a node comes back up after a restart of heartbeat, resources gets 
> bounced when it rejoins the cluster.

Well, you have defined rsc_location constraints with a score of
INFINITY, so that is expected.

> 02)  Stopping one resource in a group does not failover the group to the 
> other node.

Lookup migration-threshold.

Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

RE: [Linux-HA] Failover not working as I expected

Reply via email to