Re: [Linux-HA] Failover not working as I expected

Dominik Klein Tue, 27 Jan 2009 22:48:25 -0800

Jerome Yanga wrote:
> Dominik,
> 
> Here is the status of the two concerns I needed help on.
> 
> 01)  When a node comes back up after a restart of heartbeat, resources gets 
> bounced when it rejoins the cluster.
> STATUS:  The resources still gets bounced when a node joins the cluster even 
> if I had deleted all the constraints.


Well, your configuration lacks resource-stickiness ;) I think I already
mentioned this in an earlier email.

> 02)  Stopping one resource in a group does not failover the group to the 
> other node.
> STATUS:  migration-threshold works like a charm.  :)  Thanks.
> 
> If I may, I have another concern that popped up.
> 
> 03)  I cannot seem to get MailTo to work.  I am trying to add this resource 
> under the Directory_Server group so that everytime a failover is experienced, 
> it will notify me that it did.

The configuration of the agent is - as far as I can see - okay. You'd
have to look at the logs and see what it was doing/trying to do but failed.

Also:
Lookup your $MAILCMD in /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries
and then try to do something like:

echo "some text for the test email" | $MAILCMD -s "failover occured"
[email protected]

If that works (ie you receive the email), the agent also should work.

Regards
Dominik

> Below is the current cib.xml file I have.
> 
> <cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0" 
> have-quorum="1" dc-uuid="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" epoch="99" 
> num_updates="0" cib-last-written="Tue Jan 27 12:59:21 2009">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
> value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com" 
> type="normal">
>         <instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e">
>           <nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" 
> name="standby" value="off"/>
>         </instance_attributes>
>       </node>
>       <node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" uname="rubric.esri.com" 
> type="normal">
>         <instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c">
>           <nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c" 
> name="standby" value="off"/>
>         </instance_attributes>
>       </node>
>     </nodes>
>     <resources>
>       <group id="Directory_Server">
>         <meta_attributes id="Directory_Server-meta_attributes">
>           <nvpair id="Directory_Server-meta_attributes-collocated" 
> name="collocated" value="true"/>
>           <nvpair id="Directory_Server-meta_attributes-ordered" 
> name="ordered" value="true"/>
>           <nvpair id="Directory_Server-meta_attributes-migration-threshold" 
> name="migration-threshold" value="1"/>
>           <nvpair id="Directory_Server-meta_attributes-failure-timeout" 
> name="failure-timeout" value="10s"/>
>         </meta_attributes>
>         <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
>           <instance_attributes id="VIP-instance_attributes">
>             <nvpair id="VIP-instance_attributes-ip" name="ip" 
> value="10.50.26.250"/>
>           </instance_attributes>
>           <operations id="VIP-ops">
>             <op id="VIP-monitor-5s" interval="5s" name="monitor" 
> timeout="5s"/>
>           </operations>
>         </primitive>
>         <primitive class="ocf" id="ECAS" provider="esri" type="ecas">
>           <operations id="ECAS-ops">
>             <op id="ECAS-monitor-3s" interval="3s" name="monitor" 
> timeout="3s"/>
>           </operations>
>         </primitive>
>         <primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm">
>           <operations id="FDS_Admin-ops">
>             <op id="FDS_Admin-monitor-3s" interval="3s" name="monitor" 
> timeout="3s"/>
>           </operations>
>         </primitive>
>         <primitive class="ocf" provider="heartbeat" type="MailTo" 
> id="Emergency_Contact">
>           <instance_attributes id="Emergency_Contact-instance_attributes">
>             <nvpair id="Emergency_Contact-instance_attributes-email" 
> name="email" value="[email protected]"/>
>             <nvpair id="Emergency_Contact-instance_attributes-subject" 
> name="subject" value="Failover Occured"/>
>           </instance_attributes>
>           <operations id="Emergency_Contact-ops">
>             <op interval="3s" name="monitor" timeout="3s" 
> id="Emergency_Contact-monitor-3s"/>
>           </operations>
>         </primitive>
>       </group>
>     </resources>
>     <constraints/>
>   </configuration>
> </cib>
> 
> Help.
> 
> Regards,
> jerome
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Dominik Klein
> Sent: Monday, January 26, 2009 10:52 PM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] Failover not working as I expected
> 
> Jerome Yanga wrote:
>> Andrew,
>>
>> I apologize for my sending my previous email abruptly.
>>
>> I have followed your recommendation and installed Pacemaker.
>>
>> Here is my config.
>>
>> Packages Installed:
>> heartbeat-2.99.2-6.1
>> heartbeat-common-2.99.2-6.1
>> heartbeat-debug-2.99.2-6.1
>> heartbeat-ldirectord-2.99.2-6.1
>> heartbeat-resources-2.99.2-6.1
>> libheartbeat2-2.99.2-6.1
>> libpacemaker3-1.0.1-3.1
>> pacemaker-1.0.1-3.1
>> pacemaker-debug-1.0.1-3.1
>> pacemaker-pygui-1.4-11.9
>> pacemaker-pygui-debug-1.4-11.9
>>
>>
>>
>> ha.cf:
>> # Logging
>> debug                                1
>> use_logd                     false
>> logfacility                  daemon
>>
>> # Misc Options
>> traditional_compression      off
>> compression                  bz2
>> coredumps                    true
>>
>> # Communications
>> udpport                      691
>> bcast                                eth1 eth0
>> autojoin                     any
>>   
>> # Thresholds (in seconds)
>> keepalive                    1
>> warntime                     6
>> deadtime                     10
>> initdead                     15
>>
>> ping 10.50.254.254
>> crm respawn
>>  apiauth     mgmtd   uid=root
>>  respawn     root    /usr/lib/heartbeat/mgmtd -v
>>
>>
>> cib.xml:
>> <cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0" 
>> have-quorum="1" epoch="57" dc-uuid="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" 
>> num_updates="0" cib-last-written="Mon Jan 26 13:57:32 2009">
>>   <configuration>
>>     <crm_config>
>>       <cluster_property_set id="cib-bootstrap-options">
>>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
>> value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
>>       </cluster_property_set>
>>     </crm_config>
>>     <nodes>
>>       <node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com" 
>> type="normal">
>>         <instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e">
>>           <nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" 
>> name="standby" value="off"/>
>>         </instance_attributes>
>>       </node>
>>       <node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" 
>> uname="rubric.esri.com" type="normal">
>>         <instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c">
>>           <nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c" 
>> name="standby" value="off"/>
>>         </instance_attributes>
>>       </node>
>>     </nodes>
>>     <resources>
>>       <group id="Directory_Server">
>>         <meta_attributes id="Directory_Server-meta_attributes">
>>           <nvpair id="Directory_Server-meta_attributes-collocated" 
>> name="collocated" value="true"/>
>>           <nvpair id="Directory_Server-meta_attributes-ordered" 
>> name="ordered" value="true"/>
>>           <nvpair id="Directory_Server-meta_attributes-resource_stickiness" 
>> name="resource_stickiness" value="100"/>
>>         </meta_attributes>
>>         <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
>>           <instance_attributes id="VIP-instance_attributes">
>>             <nvpair id="VIP-instance_attributes-ip" name="ip" 
>> value="10.50.26.250"/>
>>           </instance_attributes>
>>           <operations id="VIP-ops">
>>             <op id="VIP-monitor-5s" interval="5s" name="monitor" 
>> timeout="5s"/>
>>           </operations>
>>         </primitive>
>>         <primitive class="ocf" id="ECAS" provider="esri" type="ecas">
>>           <operations id="ECAS-ops">
>>             <op id="ECAS-monitor-3s" interval="3s" name="monitor" 
>> timeout="3s"/>
>>           </operations>
>>           <meta_attributes id="ECAS-meta_attributes">
>>             <nvpair id="ECAS-meta_attributes-target-role" name="target-role" 
>> value="Started"/>
>>           </meta_attributes>
>>         </primitive>
>>         <primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm">
>>           <operations id="FDS_Admin-ops">
>>             <op id="FDS_Admin-monitor-3s" interval="3s" name="monitor" 
>> timeout="3s"/>
>>           </operations>
>>         </primitive>
>>       </group>
>>     </resources>
>>     <constraints>
>>       <rsc_location id="cli-prefer-Directory_Server" rsc="Directory_Server">
>>         <rule id="cli-prefer-rule-Directory_Server" score="INFINITY" 
>> boolean-op="and">
>>           <expression id="cli-prefer-expr-Directory_Server" 
>> attribute="#uname" operation="eq" value="rubric.esri.com" type="string"/>
>>         </rule>
>>       </rsc_location>
>>       <rsc_location id="cli-prefer-FDS_Admin" rsc="FDS_Admin">
>>         <rule id="cli-prefer-rule-FDS_Admin" score="INFINITY" 
>> boolean-op="and">
>>           <expression id="cli-prefer-expr-FDS_Admin" attribute="#uname" 
>> operation="eq" value="nomen.esri.com" type="string"/>
>>         </rule>
>>       </rsc_location>
>>     </constraints>
>>   </configuration>
>> </cib>
>>
>>
>>
>> I still have the following issues when I only had heartbeat 2.1.3-1.  My 
>> concerns are still as follows:
>>
>> 01)  When a node comes back up after a restart of heartbeat, resources gets 
>> bounced when it rejoins the cluster.
> 
> Well, you have defined rsc_location constraints with a score of
> INFINITY, so that is expected.
> 
>> 02)  Stopping one resource in a group does not failover the group to the 
>> other node.
> 
> Lookup migration-threshold.
> 
> Regards
> Dominik
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover not working as I expected

Reply via email to