hi nikita thanks for all your help and i apologize for the simple mistakes, this is my first pacemaker cluster. I do appreciate all you assistance. Currently the pingd starts but does not failover the resources the ha.cf, crm_mon and crm configure show are below
Here is my ha.cf autojoin none debug 1 debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 #use_logd on mcast eth1 239.0.0.1 694 1 0 bcast eth1 warntime 5 deadtime 20 initdead 60 keepalive 2 node crhnode1 node crhnode2 #deadping 15 #ping 10.100.0.254 crm yes Current crm_mon ============ Last updated: Fri Jan 28 14:10:22 2011 Stack: Heartbeat Current DC: crhnode2 (59440607-2a5c-450e-84fa-94bf69742671) - partition with quo rum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ crhnode1 crhnode2 ] Clone Set: MYPINGCLONE Started: [ crhnode1 crhnode2 ] Resource Group: WEBRES failoverip (ocf::heartbeat:IPaddr): Started crhnode2 crhweb (ocf::heartbeat:apache): Started crhnode2 crm configure show node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ attributes standby="off" node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ attributes standby="off" primitive MYPING ocf:pacemaker:pingd \ params host_list="10.100.0.254" multiplier="100" \ op monitor interval="15s" timeout="20s" \ op start interval="5" timeout="90s" \ op stop interval="0" timeout="100s" primitive crhweb ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="30s" \ op start interval="0" timeout="40s" \ op stop interval="0" timeout="60s" primitive failoverip ocf:heartbeat:IPaddr \ params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ op monitor interval="30s" group WEBRES failoverip crhweb \ meta target-role="Started" clone MYPINGCLONE MYPING \ meta globally-unique="false" target-role="Started" location web_location WEBRES \ rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0 order crhweb-after-failoverip inf: MYPINGCLONE WEBRES property $id="cib-bootstrap-options" \ dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" On 1 February 2011 12:04, Nikita Michalko <michalko.sys...@a-i-p.com> wrote: > Hi Paul! > > Can you show me your ha.cf? > How many network interfaces do you use for this cluster? > If only one, it is the typical split-brain situation after cable pull down! > > Nikita > > > Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford: > > Hi NIkita > > I reverted to an early snapshot and started again i now have ping d > running > > but when i remove the eth0 the resource does not failover > > > > i can see in the ha-log that the ping detects the network is gone but it > > does not move the resource. Can anyone see the error in my config? > > > > > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \ > > attributes standby="off" > > node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \ > > attributes standby="off" > > primitive MYPING ocf:pacemaker:pingd \ > > params host_list="10.100.0.254" multiplier="1000" \ > > op monitor interval="15s" timeout="20s" \ > > op start interval="0" timeout="90s" \ > > op stop interval="0" timeout="100s" > > primitive crhweb ocf:heartbeat:apache \ > > params configfile="/etc/httpd/conf/httpd.conf" \ > > op monitor interval="60s" \ > > meta target-role="Started" > > primitive failoverip ocf:heartbeat:IPaddr \ > > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ > > op monitor interval="30s" > > clone MYPINGCLONE MYPING \ > > meta globally-unique="false" > > location web_location crhweb \ > > rule $id="web_location-rule" -inf: not_defined pingd or pingd lte > 0 > > colocation crhweb-with-failoverip inf: crhweb failoverip > > order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb > > property $id="cib-bootstrap-options" \ > > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ > > cluster-infrastructure="Heartbeat" \ > > stonith-enabled="false" \ > > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" > > > > > > HA_LOG > > > > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending > packet: > > Network is unreachable > > Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0 > > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write > failure > > on ping 10.100.0.254.: Network is unreachable > > Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39 > > chars: Network is unreachable (101 > > > > On 1 February 2011 09:35, paul harford <harfordmeis...@gmail.com> wrote: > > > Hi NIkita > > > Many thanks for your assistance, i updated the changes you noticed but > > > now my 2 nodes just keep rebooting, did i enter something incorrectly > in > > > the pingd directive ? > > > > > > Paul > > > > > > > > > i can see these errors in the messages log and my configuration is > below > > > > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print: Clone > > > Set: connected > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: short_print: > > > Stopped: [ pingd:0 pingd:1 ] > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights: > > > failoverip: Rolling back scores from crhweb > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: native_color: Resource > > > crhweb cannot run anywhere > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start > > > recurring monitor (10s) for pingd:0 on crhnode2 > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation > > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use > > > the same (name, interval) combination more than once per resource > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation > > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use > > > the same (name, interval) combination more than once per resource > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start > > > recurring monitor (10s) for pingd:1 on crhnode1 > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation > > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use > > > the same (name, interval) combination more than once per resource > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation > > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use > > > the same (name, interval) combination more than once per resource > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave > > > resource failoverip (Started crhnode1) > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop > > > resource crhweb (crhnode1) > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start > > > pingd:0 (crhnode2) > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start > > > pingd:1 (crhnode1) > > > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition: State > > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > > > cause=C_IPC_MESSAGE origin=handle_response ] > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message: > > > Transition 59: PEngine Input stored in: > /var/lib/pengine/pe-input-82.bz2 > > > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked > > > transition 59: 14 actions in 14 synapses > > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message: > > > Configuration ERRORs found during PE processing. Please run > "crm_verify > > > -L" to identify issues. > > > > > > > > > > > > here is my current configuration > > > > > > > > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ > > > attributes standby="off" > > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ > > > attributes standby="off" > > > primitive crhweb ocf:heartbeat:apache \ > > > > > > params configfile="/etc/httpd/conf/httpd.conf" \ > > > op monitor interval="60s" \ > > > meta target-role="Started" > > > primitive failoverip ocf:heartbeat:IPaddr \ > > > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ > > > op monitor interval="30s" \ > > > meta target-role="Started" > > > primitive pingd ocf:pacemaker:pingd \ > > > params dampen="5s" host_list="10.100.0.254" multiplier="1000" > > > name="pingval" \ > > > operations $id="pingd-operations" \ > > > op monitor interval="10s" timeout="20s" \ > > > op monitor interval="90s" timeout="25s" start \ > > > op monitor interval="100s" timeout="25s" stop > > > clone connected pingd \ > > > > > > meta globally-unique="false" target-role="started" > > > location cli-prefer-crhweb crhweb \ > > > > > > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1 > > > location crhweb_on_connected_node crhweb \ > > > rule $id="crhweb_on_connected_node-rule" -inf: not_defined > > > pingval or pingval lte 0 > > > > > > location prefer-crhnode1 crhweb 50: crhnode1 > > > colocation crhweb-with-failoverip inf: crhweb failoverip > > > order crhweb-after-failoverip inf: pingd failoverip crhweb > > > > > > property $id="cib-bootstrap-options" \ > > > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ > > > cluster-infrastructure="Heartbeat" \ > > > stonith-enabled="false" \ > > > no-quorum-policy="ignore" > > > > > > On 1 February 2011 07:21, Nikita Michalko > <michalko.sys...@a-i-p.com>wrote: > > >> Hi Paul, > > >> > > >> see below! > > >> > > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford: > > >> > HI guys > > >> > i'm having some issues with a ping directive, my current config is > > >> > below and basically i want the web resource to failover to the > second > > >> > node if > > >> > > >> the > > >> > > >> > ping can no longer contact the default gateway > > >> > > > >> > so here goes > > >> > > > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s > > >> > host_list=(default GateWay) multplier=1000 name=pingval operations > > >> > $id=ping-operations op moinitor interval=10s timeout=15s > > >> > > >> - this is surely wrong: "moinitor" ? > > >> - no such primitive (ping) below ... > > >> > > >> HTH > > >> > > >> Nikita Michalko > > >> > > >> > and > > >> > > > >> > crm configure clone connected ping meta globally-unique=false > > >> > target-role=started > > >> > > > >> > and > > >> > > > >> > location web_on_connected_node cweb rule > > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or pingval > > >> > lte 0 > > >> > > > >> > > > >> > Does anyone see any isssues's whith the above confiuguration ? i > want > > >> > to check first as the last time i tried it wouldn't work and my > > >> > resources would not failover or start > > >> > > > >> > > > >> > > > >> > > > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ > > >> > attributes standby="off" > > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ > > >> > attributes standby="off" > > >> > primitive cweb ocf:heartbeat:apache \ > > >> > params configfile="/etc/httpd/conf/httpd.conf" \ > > >> > op monitor interval="60s" \ > > >> > meta target-role="Started" > > >> > primitive failoverip ocf:heartbeat:IPaddr \ > > >> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ > > >> > op monitor interval="30s" \ > > >> > meta target-role="Started" > > >> > location cli-prefer-cweb cweb \ > > >> > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1 > > >> > location prefer-crhnode1 crhweb 50: crhnode1 > > >> > colocation cweb-with-failoverip inf: cweb failoverip > > >> > order crhweb-after-failoverip inf: failoverip cweb > > >> > property $id="cib-bootstrap-options" \ > > >> > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" > \ > > >> > cluster-infrastructure="Heartbeat" \ > > >> > stonith-enabled="false" \ > > >> > no-quorum-policy="ignore" > > >> > rsc_defaults $id="rsc-options" \ > > >> > resource-stickiness="100" > > >> > > >> _______________________________________________ > > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > >> > > >> Project Home: http://www.clusterlabs.org > > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >> Bugs: > > >> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake > > >>r > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker