Hi Again :-) I think my main problem is my location configuration when i bring down eth0 on node1 the and looking at crm_m -f the count on node 2 never increases
Could anyone help me out with the pingd / location restraints required for a group of resources to failover from node1 to node 2 if the node1 can no longer ping the default gateway ? Thanks again On 1 February 2011 13:08, paul harford <harfordmeis...@gmail.com> wrote: > Hi Nikita > Sorry i fogot i have 2 ethernet interfaces eth 1 is for the heartbeat and > eth 0 is for the public ip and the virtual ip for apache is 10.100.1.100 > > Thanks > Paul > > > On 1 February 2011 12:04, Nikita Michalko <michalko.sys...@a-i-p.com>wrote: > >> Hi Paul! >> >> Can you show me your ha.cf? >> How many network interfaces do you use for this cluster? >> If only one, it is the typical split-brain situation after cable pull >> down! >> >> Nikita >> >> >> Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford: >> > Hi NIkita >> > I reverted to an early snapshot and started again i now have ping d >> running >> > but when i remove the eth0 the resource does not failover >> > >> > i can see in the ha-log that the ping detects the network is gone but it >> > does not move the resource. Can anyone see the error in my config? >> > >> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \ >> > attributes standby="off" >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \ >> > attributes standby="off" >> > primitive MYPING ocf:pacemaker:pingd \ >> > params host_list="10.100.0.254" multiplier="1000" \ >> > op monitor interval="15s" timeout="20s" \ >> > op start interval="0" timeout="90s" \ >> > op stop interval="0" timeout="100s" >> > primitive crhweb ocf:heartbeat:apache \ >> > params configfile="/etc/httpd/conf/httpd.conf" \ >> > op monitor interval="60s" \ >> > meta target-role="Started" >> > primitive failoverip ocf:heartbeat:IPaddr \ >> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ >> > op monitor interval="30s" >> > clone MYPINGCLONE MYPING \ >> > meta globally-unique="false" >> > location web_location crhweb \ >> > rule $id="web_location-rule" -inf: not_defined pingd or pingd >> lte 0 >> > colocation crhweb-with-failoverip inf: crhweb failoverip >> > order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb >> > property $id="cib-bootstrap-options" \ >> > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ >> > cluster-infrastructure="Heartbeat" \ >> > stonith-enabled="false" \ >> > no-quorum-policy="ignore" >> > rsc_defaults $id="rsc-options" \ >> > resource-stickiness="100" >> > >> > >> > HA_LOG >> > >> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending >> packet: >> > Network is unreachable >> > Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0 >> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write >> failure >> > on ping 10.100.0.254.: Network is unreachable >> > Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39 >> > chars: Network is unreachable (101 >> > >> > On 1 February 2011 09:35, paul harford <harfordmeis...@gmail.com> >> wrote: >> > > Hi NIkita >> > > Many thanks for your assistance, i updated the changes you noticed but >> > > now my 2 nodes just keep rebooting, did i enter something incorrectly >> in >> > > the pingd directive ? >> > > >> > > Paul >> > > >> > > >> > > i can see these errors in the messages log and my configuration is >> below >> > > >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print: Clone >> > > Set: connected >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: short_print: >> > > Stopped: [ pingd:0 pingd:1 ] >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights: >> > > failoverip: Rolling back scores from crhweb >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: native_color: Resource >> > > crhweb cannot run anywhere >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start >> > > recurring monitor (10s) for pingd:0 on crhnode2 >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use >> > > the same (name, interval) combination more than once per resource >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use >> > > the same (name, interval) combination more than once per resource >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start >> > > recurring monitor (10s) for pingd:1 on crhnode1 >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use >> > > the same (name, interval) combination more than once per resource >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use >> > > the same (name, interval) combination more than once per resource >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave >> > > resource failoverip (Started crhnode1) >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop >> > > resource crhweb (crhnode1) >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start >> > > pingd:0 (crhnode2) >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start >> > > pingd:1 (crhnode1) >> > > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition: >> State >> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS >> > > cause=C_IPC_MESSAGE origin=handle_response ] >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message: >> > > Transition 59: PEngine Input stored in: >> /var/lib/pengine/pe-input-82.bz2 >> > > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked >> > > transition 59: 14 actions in 14 synapses >> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message: >> > > Configuration ERRORs found during PE processing. Please run >> "crm_verify >> > > -L" to identify issues. >> > > >> > > >> > > >> > > here is my current configuration >> > > >> > > >> > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ >> > > attributes standby="off" >> > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ >> > > attributes standby="off" >> > > primitive crhweb ocf:heartbeat:apache \ >> > > >> > > params configfile="/etc/httpd/conf/httpd.conf" \ >> > > op monitor interval="60s" \ >> > > meta target-role="Started" >> > > primitive failoverip ocf:heartbeat:IPaddr \ >> > > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ >> > > op monitor interval="30s" \ >> > > meta target-role="Started" >> > > primitive pingd ocf:pacemaker:pingd \ >> > > params dampen="5s" host_list="10.100.0.254" multiplier="1000" >> > > name="pingval" \ >> > > operations $id="pingd-operations" \ >> > > op monitor interval="10s" timeout="20s" \ >> > > op monitor interval="90s" timeout="25s" start \ >> > > op monitor interval="100s" timeout="25s" stop >> > > clone connected pingd \ >> > > >> > > meta globally-unique="false" target-role="started" >> > > location cli-prefer-crhweb crhweb \ >> > > >> > > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1 >> > > location crhweb_on_connected_node crhweb \ >> > > rule $id="crhweb_on_connected_node-rule" -inf: not_defined >> > > pingval or pingval lte 0 >> > > >> > > location prefer-crhnode1 crhweb 50: crhnode1 >> > > colocation crhweb-with-failoverip inf: crhweb failoverip >> > > order crhweb-after-failoverip inf: pingd failoverip crhweb >> > > >> > > property $id="cib-bootstrap-options" \ >> > > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ >> > > cluster-infrastructure="Heartbeat" \ >> > > stonith-enabled="false" \ >> > > no-quorum-policy="ignore" >> > > >> > > On 1 February 2011 07:21, Nikita Michalko >> <michalko.sys...@a-i-p.com>wrote: >> > >> Hi Paul, >> > >> >> > >> see below! >> > >> >> > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford: >> > >> > HI guys >> > >> > i'm having some issues with a ping directive, my current config is >> > >> > below and basically i want the web resource to failover to the >> second >> > >> > node if >> > >> >> > >> the >> > >> >> > >> > ping can no longer contact the default gateway >> > >> > >> > >> > so here goes >> > >> > >> > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s >> > >> > host_list=(default GateWay) multplier=1000 name=pingval operations >> > >> > $id=ping-operations op moinitor interval=10s timeout=15s >> > >> >> > >> - this is surely wrong: "moinitor" ? >> > >> - no such primitive (ping) below ... >> > >> >> > >> HTH >> > >> >> > >> Nikita Michalko >> > >> >> > >> > and >> > >> > >> > >> > crm configure clone connected ping meta globally-unique=false >> > >> > target-role=started >> > >> > >> > >> > and >> > >> > >> > >> > location web_on_connected_node cweb rule >> > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or pingval >> > >> > lte 0 >> > >> > >> > >> > >> > >> > Does anyone see any isssues's whith the above confiuguration ? i >> want >> > >> > to check first as the last time i tried it wouldn't work and my >> > >> > resources would not failover or start >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ >> > >> > attributes standby="off" >> > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ >> > >> > attributes standby="off" >> > >> > primitive cweb ocf:heartbeat:apache \ >> > >> > params configfile="/etc/httpd/conf/httpd.conf" \ >> > >> > op monitor interval="60s" \ >> > >> > meta target-role="Started" >> > >> > primitive failoverip ocf:heartbeat:IPaddr \ >> > >> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ >> > >> > op monitor interval="30s" \ >> > >> > meta target-role="Started" >> > >> > location cli-prefer-cweb cweb \ >> > >> > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1 >> > >> > location prefer-crhnode1 crhweb 50: crhnode1 >> > >> > colocation cweb-with-failoverip inf: cweb failoverip >> > >> > order crhweb-after-failoverip inf: failoverip cweb >> > >> > property $id="cib-bootstrap-options" \ >> > >> > >> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ >> > >> > cluster-infrastructure="Heartbeat" \ >> > >> > stonith-enabled="false" \ >> > >> > no-quorum-policy="ignore" >> > >> > rsc_defaults $id="rsc-options" \ >> > >> > resource-stickiness="100" >> > >> >> > >> _______________________________________________ >> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> >> > >> Project Home: http://www.clusterlabs.org >> > >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > >> Bugs: >> > >> >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake >> > >>r >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker