Hi, On Tue, Feb 1, 2011 at 6:55 PM, paul harford <harfordmeis...@gmail.com>wrote:
> Hi Again :-) > > I think my main problem is my location configuration when i bring down eth0 > on node1 the and looking at crm_m -f the count on node 2 never increases > > Could anyone help me out with the pingd / location restraints required for > a group of resources to failover from node1 to node 2 if the node1 can no > longer ping the default gateway ? > Don't use pingd, use ocf:pacemaker:ping. Here's a working config: primitive ping_the_gw ocf:pacemaker:ping \ params host_list="1.2.3.4" multiplier="100" name="ping_the_gw" \ op monitor interval="5s" timeout="60s" \ op start interval="0s" timeout="60s" \ op stop interval="0s" clone ping_the_gw_clone ping_the_gw \ meta globally-unique="false" location nok_ping_the_gw grouped_resources \ rule $id="nok_ping_the_gw-rule" -inf: not_defined ping_the_gw or ping_the_gw lte 0 group grouped_resources virtual_ip fs_mysql httpd mysqld The "grouped_resources" group will not be allowed to run on a node if the ping_the_gw resource is not defined on that node or that node cannot ping the gateway. In your config you should change location web_location crhweb \ rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0 to location web_location crhweb \ rule $id="web_location-rule" -inf: not_defined MYPING or MYPING lte 0 and primitive MYPING ocf:pacemaker:ping \ params host_list="10.100.0.254" multiplier="1000" \ op monitor interval="15s" timeout="20s" \ op start interval="0" timeout="90s" \ op stop interval="0" timeout="100s" to primitive MYPING ocf:pacemaker:ping \ params host_list="10.100.0.254" multiplier="1000" \ op monitor interval="15s" timeout="20s" \ op start interval="0" timeout="90s" \ op stop interval="0" timeout="100s" Regards, Dan > > Thanks > again > > > > > On 1 February 2011 13:08, paul harford <harfordmeis...@gmail.com> wrote: > >> Hi Nikita >> Sorry i fogot i have 2 ethernet interfaces eth 1 is for the heartbeat and >> eth 0 is for the public ip and the virtual ip for apache is 10.100.1.100 >> >> Thanks >> Paul >> >> >> On 1 February 2011 12:04, Nikita Michalko <michalko.sys...@a-i-p.com>wrote: >> >>> Hi Paul! >>> >>> Can you show me your ha.cf? >>> How many network interfaces do you use for this cluster? >>> If only one, it is the typical split-brain situation after cable pull >>> down! >>> >>> Nikita >>> >>> >>> Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford: >>> > Hi NIkita >>> > I reverted to an early snapshot and started again i now have ping d >>> running >>> > but when i remove the eth0 the resource does not failover >>> > >>> > i can see in the ha-log that the ping detects the network is gone but >>> it >>> > does not move the resource. Can anyone see the error in my config? >>> > >>> > >>> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \ >>> > attributes standby="off" >>> > node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \ >>> > attributes standby="off" >>> > primitive MYPING ocf:pacemaker:pingd \ >>> > params host_list="10.100.0.254" multiplier="1000" \ >>> > op monitor interval="15s" timeout="20s" \ >>> > op start interval="0" timeout="90s" \ >>> > op stop interval="0" timeout="100s" >>> > primitive crhweb ocf:heartbeat:apache \ >>> > params configfile="/etc/httpd/conf/httpd.conf" \ >>> > op monitor interval="60s" \ >>> > meta target-role="Started" >>> > primitive failoverip ocf:heartbeat:IPaddr \ >>> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ >>> > op monitor interval="30s" >>> > clone MYPINGCLONE MYPING \ >>> > meta globally-unique="false" >>> > location web_location crhweb \ >>> > rule $id="web_location-rule" -inf: not_defined pingd or pingd >>> lte 0 >>> > colocation crhweb-with-failoverip inf: crhweb failoverip >>> > order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb >>> > property $id="cib-bootstrap-options" \ >>> > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ >>> > cluster-infrastructure="Heartbeat" \ >>> > stonith-enabled="false" \ >>> > no-quorum-policy="ignore" >>> > rsc_defaults $id="rsc-options" \ >>> > resource-stickiness="100" >>> > >>> > >>> > HA_LOG >>> > >>> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending >>> packet: >>> > Network is unreachable >>> > Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0 >>> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write >>> failure >>> > on ping 10.100.0.254.: Network is unreachable >>> > Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39 >>> > chars: Network is unreachable (101 >>> > >>> > On 1 February 2011 09:35, paul harford <harfordmeis...@gmail.com> >>> wrote: >>> > > Hi NIkita >>> > > Many thanks for your assistance, i updated the changes you noticed >>> but >>> > > now my 2 nodes just keep rebooting, did i enter something incorrectly >>> in >>> > > the pingd directive ? >>> > > >>> > > Paul >>> > > >>> > > >>> > > i can see these errors in the messages log and my configuration is >>> below >>> > > >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print: Clone >>> > > Set: connected >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: short_print: >>> > > Stopped: [ pingd:0 pingd:1 ] >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights: >>> > > failoverip: Rolling back scores from crhweb >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: native_color: >>> Resource >>> > > crhweb cannot run anywhere >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start >>> > > recurring monitor (10s) for pingd:0 on crhnode2 >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not >>> use >>> > > the same (name, interval) combination more than once per resource >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not >>> use >>> > > the same (name, interval) combination more than once per resource >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start >>> > > recurring monitor (10s) for pingd:1 on crhnode1 >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not >>> use >>> > > the same (name, interval) combination more than once per resource >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation >>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not >>> use >>> > > the same (name, interval) combination more than once per resource >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave >>> > > resource failoverip (Started crhnode1) >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop >>> > > resource crhweb (crhnode1) >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start >>> > > pingd:0 (crhnode2) >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start >>> > > pingd:1 (crhnode1) >>> > > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition: >>> State >>> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ >>> input=I_PE_SUCCESS >>> > > cause=C_IPC_MESSAGE origin=handle_response ] >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message: >>> > > Transition 59: PEngine Input stored in: >>> /var/lib/pengine/pe-input-82.bz2 >>> > > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked >>> > > transition 59: 14 actions in 14 synapses >>> > > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message: >>> > > Configuration ERRORs found during PE processing. Please run >>> "crm_verify >>> > > -L" to identify issues. >>> > > >>> > > >>> > > >>> > > here is my current configuration >>> > > >>> > > >>> > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ >>> > > attributes standby="off" >>> > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ >>> > > attributes standby="off" >>> > > primitive crhweb ocf:heartbeat:apache \ >>> > > >>> > > params configfile="/etc/httpd/conf/httpd.conf" \ >>> > > op monitor interval="60s" \ >>> > > meta target-role="Started" >>> > > primitive failoverip ocf:heartbeat:IPaddr \ >>> > > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ >>> > > op monitor interval="30s" \ >>> > > meta target-role="Started" >>> > > primitive pingd ocf:pacemaker:pingd \ >>> > > params dampen="5s" host_list="10.100.0.254" multiplier="1000" >>> > > name="pingval" \ >>> > > operations $id="pingd-operations" \ >>> > > op monitor interval="10s" timeout="20s" \ >>> > > op monitor interval="90s" timeout="25s" start \ >>> > > op monitor interval="100s" timeout="25s" stop >>> > > clone connected pingd \ >>> > > >>> > > meta globally-unique="false" target-role="started" >>> > > location cli-prefer-crhweb crhweb \ >>> > > >>> > > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1 >>> > > location crhweb_on_connected_node crhweb \ >>> > > rule $id="crhweb_on_connected_node-rule" -inf: not_defined >>> > > pingval or pingval lte 0 >>> > > >>> > > location prefer-crhnode1 crhweb 50: crhnode1 >>> > > colocation crhweb-with-failoverip inf: crhweb failoverip >>> > > order crhweb-after-failoverip inf: pingd failoverip crhweb >>> > > >>> > > property $id="cib-bootstrap-options" \ >>> > > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" >>> \ >>> > > cluster-infrastructure="Heartbeat" \ >>> > > stonith-enabled="false" \ >>> > > no-quorum-policy="ignore" >>> > > >>> > > On 1 February 2011 07:21, Nikita Michalko >>> <michalko.sys...@a-i-p.com>wrote: >>> > >> Hi Paul, >>> > >> >>> > >> see below! >>> > >> >>> > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford: >>> > >> > HI guys >>> > >> > i'm having some issues with a ping directive, my current config is >>> > >> > below and basically i want the web resource to failover to the >>> second >>> > >> > node if >>> > >> >>> > >> the >>> > >> >>> > >> > ping can no longer contact the default gateway >>> > >> > >>> > >> > so here goes >>> > >> > >>> > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s >>> > >> > host_list=(default GateWay) multplier=1000 name=pingval operations >>> > >> > $id=ping-operations op moinitor interval=10s timeout=15s >>> > >> >>> > >> - this is surely wrong: "moinitor" ? >>> > >> - no such primitive (ping) below ... >>> > >> >>> > >> HTH >>> > >> >>> > >> Nikita Michalko >>> > >> >>> > >> > and >>> > >> > >>> > >> > crm configure clone connected ping meta globally-unique=false >>> > >> > target-role=started >>> > >> > >>> > >> > and >>> > >> > >>> > >> > location web_on_connected_node cweb rule >>> > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or >>> pingval >>> > >> > lte 0 >>> > >> > >>> > >> > >>> > >> > Does anyone see any isssues's whith the above confiuguration ? i >>> want >>> > >> > to check first as the last time i tried it wouldn't work and my >>> > >> > resources would not failover or start >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \ >>> > >> > attributes standby="off" >>> > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \ >>> > >> > attributes standby="off" >>> > >> > primitive cweb ocf:heartbeat:apache \ >>> > >> > params configfile="/etc/httpd/conf/httpd.conf" \ >>> > >> > op monitor interval="60s" \ >>> > >> > meta target-role="Started" >>> > >> > primitive failoverip ocf:heartbeat:IPaddr \ >>> > >> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \ >>> > >> > op monitor interval="30s" \ >>> > >> > meta target-role="Started" >>> > >> > location cli-prefer-cweb cweb \ >>> > >> > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1 >>> > >> > location prefer-crhnode1 crhweb 50: crhnode1 >>> > >> > colocation cweb-with-failoverip inf: cweb failoverip >>> > >> > order crhweb-after-failoverip inf: failoverip cweb >>> > >> > property $id="cib-bootstrap-options" \ >>> > >> > >>> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ >>> > >> > cluster-infrastructure="Heartbeat" \ >>> > >> > stonith-enabled="false" \ >>> > >> > no-quorum-policy="ignore" >>> > >> > rsc_defaults $id="rsc-options" \ >>> > >> > resource-stickiness="100" >>> > >> >>> > >> _______________________________________________ >>> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> > >> >>> > >> Project Home: http://www.clusterlabs.org >>> > >> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> > >> Bugs: >>> > >> >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake >>> > >>r >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>> >> >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > -- Dan Frincu CCNA, RHCE
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker