Andrew,
I apologize for my sending my previous email abruptly.
I have followed your recommendation and installed Pacemaker.
Here is my config.
Packages Installed:
heartbeat-2.99.2-6.1
heartbeat-common-2.99.2-6.1
heartbeat-debug-2.99.2-6.1
heartbeat-ldirectord-2.99.2-6.1
heartbeat-resources-2.99.2-6.1
libheartbeat2-2.99.2-6.1
libpacemaker3-1.0.1-3.1
pacemaker-1.0.1-3.1
pacemaker-debug-1.0.1-3.1
pacemaker-pygui-1.4-11.9
pacemaker-pygui-debug-1.4-11.9
ha.cf:
# Logging
debug 1
use_logd false
logfacility daemon
# Misc Options
traditional_compression off
compression bz2
coredumps true
# Communications
udpport 691
bcast eth1 eth0
autojoin any
# Thresholds (in seconds)
keepalive 1
warntime 6
deadtime 10
initdead 15
ping 10.50.254.254
crm respawn
apiauth mgmtd uid=root
respawn root /usr/lib/heartbeat/mgmtd -v
cib.xml:
<cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0"
have-quorum="1" epoch="57" dc-uuid="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e"
num_updates="0" cib-last-written="Mon Jan 26 13:57:32 2009">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com"
type="normal">
<instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e">
<nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e"
name="standby" value="off"/>
</instance_attributes>
</node>
<node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" uname="rubric.esri.com"
type="normal">
<instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c">
<nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c"
name="standby" value="off"/>
</instance_attributes>
</node>
</nodes>
<resources>
<group id="Directory_Server">
<meta_attributes id="Directory_Server-meta_attributes">
<nvpair id="Directory_Server-meta_attributes-collocated"
name="collocated" value="true"/>
<nvpair id="Directory_Server-meta_attributes-ordered" name="ordered"
value="true"/>
<nvpair id="Directory_Server-meta_attributes-resource_stickiness"
name="resource_stickiness" value="100"/>
</meta_attributes>
<primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
<instance_attributes id="VIP-instance_attributes">
<nvpair id="VIP-instance_attributes-ip" name="ip"
value="10.50.26.250"/>
</instance_attributes>
<operations id="VIP-ops">
<op id="VIP-monitor-5s" interval="5s" name="monitor" timeout="5s"/>
</operations>
</primitive>
<primitive class="ocf" id="ECAS" provider="esri" type="ecas">
<operations id="ECAS-ops">
<op id="ECAS-monitor-3s" interval="3s" name="monitor" timeout="3s"/>
</operations>
<meta_attributes id="ECAS-meta_attributes">
<nvpair id="ECAS-meta_attributes-target-role" name="target-role"
value="Started"/>
</meta_attributes>
</primitive>
<primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm">
<operations id="FDS_Admin-ops">
<op id="FDS_Admin-monitor-3s" interval="3s" name="monitor"
timeout="3s"/>
</operations>
</primitive>
</group>
</resources>
<constraints>
<rsc_location id="cli-prefer-Directory_Server" rsc="Directory_Server">
<rule id="cli-prefer-rule-Directory_Server" score="INFINITY"
boolean-op="and">
<expression id="cli-prefer-expr-Directory_Server" attribute="#uname"
operation="eq" value="rubric.esri.com" type="string"/>
</rule>
</rsc_location>
<rsc_location id="cli-prefer-FDS_Admin" rsc="FDS_Admin">
<rule id="cli-prefer-rule-FDS_Admin" score="INFINITY" boolean-op="and">
<expression id="cli-prefer-expr-FDS_Admin" attribute="#uname"
operation="eq" value="nomen.esri.com" type="string"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
I still have the following issues when I only had heartbeat 2.1.3-1. My
concerns are still as follows:
01) When a node comes back up after a restart of heartbeat, resources gets
bounced when it rejoins the cluster.
02) Stopping one resource in a group does not failover the group to the other
node.
Help.
Regards,
Jerome
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Beekhof
Sent: Tuesday, January 20, 2009 1:33 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected
On Tue, Jan 20, 2009 at 21:48, Jerome Yanga <[email protected]> wrote:
> Dominik,
>
> Per your request, attached is my current configuration.
>
> To reiterate, the following are still concerns:
>
> 01) Resources gets bounced when Nomen rejoins the cluster.
> 02) Group failover will not work as hoped.
>
> As per resource monitoring, I believe that the customized init scripts are
> working properly; however, me being a noob seems to contradict this. I have
> tested the init scripts in a way that when a failure of the resource is
> experienced the service is restarted. After seeing that the init script is
> working, I have set the "On Fail" value to "stop" instead of "restart".
>
> Moreover, I have tried varying the group scores by changing the
> resource_stickiness and the resource_failure_stickiness values.
I would highly encourage you to upgrade to the latest stable series of
Pacemaker.
The whole failure stickiness nonsense has been completely dropped in
favor of something thats actually usable.
http://clusterlabs.org/wiki/Install
http://clusterlabs.org/wiki/Documentation <-- look for the 1.0 version
of configuration explained
> However, I have not been able to consistently failover the group by stopping
> one of the resources. During the testing, I have tried using the equation
> below from the site you provided in your previous email.
>
> node = (constraint-score) + (num_group_resources * resource_stickiness) +
> (failcount * (resource_failure_stickiness) )
>
> Unfortunately, the scores does not seem to follow this equation as I would
> verify them using the showscores.sh. The following values were assign to the
> Directory_Server group during this testing.
>
> resource_stickiness=100
> resource_failure_stickiness=-500
>
> I have also attempted to use the crm_failcount command to make sure that the
> scores prior to failing any resource gets reset, but showscores.sh seems to
> show that the command is not working.
>
> I have also tried to change the cib.xml file manually to assign the values
> above to default-resource-stickiness and default-resource-failure-stickiness
> respectively, but after doing so, all the resources seems to disappear.
> (Good thing I had created a copy of the cib.xml file.)
>
> By the way, I have changed the values back to the following:
>
> resource_stickiness=100
> resource_failure_stickiness=-100
>
> Help.
>
> Regards,
> Jerome
>
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Dominik Klein
> Sent: Monday, January 19, 2009 11:31 PM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] Failover not working as I expected
>
> Jerome Yanga wrote:
>> Dominik,
>>
>> Thank you much. Adding "resource-stickiness" and getting rid of the
>> constraint helped a lot. The resources does not go back to Nomen anymore
>> when it's heartbeat is started again (resources stays with Rubric).
>> However, the resources still gets bounced once Nomen joins the cluster. Is
>> there any way to keep the resources from bouncing when Nomen rejoins the
>> cluster?
>
> Please share your current configuration.
>
>> I have also observed another issue. As you have seen in my cib.xml, I have
>> created a group called Directory_Server. In this group, there are three
>> resources, namely: VIP, ECAS and FDS_Admin. If I manually turn off any of
>> these resources, I would like the group resource, Directory_Server, to
>> failover to the other node. Is there a configuration that will do this?
>> Currently, if one of three resources goes down it stays down and the rest
>> continues running. All three resources will need to be up and running for
>> our applications to work properly.
>
> Sounds like you're not doing any resource monitoring. Read up on that
> and configure it. The ScoreCalculation page might be handy to understand
> how things work: http://www.linux-ha.org/ScoreCalculation
>
> Regards
> Dominik
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems