Re: [Pacemaker] Help with Pacemaker 2-node Router Setup

Eric Renfro Sat, 26 Dec 2009 22:34:54 -0800

Eric Renfro wrote:

Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 11:55:57 schrieb Eric Renfro:
Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro:
Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
Hello,
I'm trying to setup 2 nodes that'll run pacemaker with openais as
the communication layer. Ideally what I want is for router1 to be
the master node and take over for router2 if it comes back upfullyfunctional again. In my setup, the routers are bothinternet-facingservers that toggle the external internet IP to whichevercontrolsit at the time, and also handles the internal IP for thegateway for
internal systems to route via.

My problem is with Route in my setup, so far, and later getting
shorewall to start/stop per whichever nodes active.

Route, in my case in the setup I will show below, is failing to
start initially because I presume the internet IP address is not
fully initialized at the time it's trying to enable the route.If Ido a crm resource cleanup failover-gw, it brings it up justfine. If
I try to move the router_cluster resource to router2 from router1
after it's fully up, it fails because of failover-gw on router2.
Very unlikely. If the IPaddr2 script finishes the IP address isup.
Please search for other reasons and grep "lrm.*failover-gw" in the
logs.
Here's my setup at present. For the moment, until I figure outhow
to do it, shorewall is started manually, I want to automate this
once the setup is working, though, perhaps you guys could help me
with that as well.

primitive failover-int-ip ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.1" \
        op monitor interval="2s"
primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
        params ip="24.227.124.158" cidr_netmask="30"
broadcast="24.227.124.159" nic="net0" \
        op monitor interval="2s" \
        meta target-role="Started"
primitive failover-gw ocf:heartbeat:Route \
        params destination="0.0.0.0/0" gateway="24.227.124.157"
device="net0" \
        meta target-role="Started" \
        op monitor interval="2s"
group router_cluster failover-int-ip failover-ext-ip failover-gw
location router-master router_cluster \
rule $id="router-master-rule" $role="master" 100:#uname eq
router1

I would appreciate as much help as possible. I am fairly new to
pacemaker, but so far all but the Route part of this works well.
Please give us a chance to help you providing the interestinglogs!
Sure..
Here's a big clip of a log grepped from just failover-gw, if this
helps hopefully, else, I can pinpoint more around what's happening,
the logs fill up pretty quickly as it's coming alive.
messages:Dec 26 02:00:21 router1 pengine: [4724]: info:unpack_rsc_op:failover-gw_monitor_0 on router2 returned 5 (not installed)instead of
the expected value: 7 (not running)
(...)
The rest of the logs is not needed. Just the first line tells youthatthat something is not installed correctly. Please read the linesjust
abobe this line. Normally it tells you what is missing.

You also your read trough the routing resource agent in
/usr/lib/ocf/resource.d/heartbeat/Route

Greetings,
Hmmm..
I'm not seeing anything about it, here's a clip of the abovelines, and
one line below the one saying (not installed).
Dec 26 05:00:21 router1 pengine: [4724]: info:determine_online_status:
Node router1 is online
Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
failover-gw_monitor_0 on router1 returned 0 (ok) instead of theexpect
ed value: 7 (not running)
Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op:Operation
failover-gw_monitor_0 found resource failover-gw active on r
outer1
Dec 26 05:00:21 router1 pengine: [4724]: info:determine_online_status:
Node router2 is online
Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
failover-gw_monitor_0 on router2 returned 5 (not installed)instead of
 the expected value: 7 (not running)
Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard
error - failover-gw_monitor_0 failed with rc=5: Preventingfailover-gw
from re-starting on router2
Hi,
there must be other log entries. In the Router RA I have before erroutthe agent write reasons into the ocf_log(). What version ofpacemaker and
cluster- glue do you have? What distribution you a running on?

Greetings,
I've checked all my logs. Syslog logs everything to my messageslogfile,
so it should be there if anywhere.

I'm running OpenSUSE 11.2 which comes with heartbeat 2.99.3, pacemaker
1.0.1, openais 0.80.3, as to what all's running in this setup.
Hm. This is already a quite old verison of pacemaker. But it shouldrun anyway. Please could you check the resource manually on router1.
export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_destination="0.0.0.0/0"
export OCF_RESKEY_gateway="24.227.124.157"

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should reult in 0 (started) or 7 (not started)

/usr/lib/ocf/resource.d/heartbeat/Route start; echo $?
should add the default route and result in 0

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should result in 0 (started)

/usr/lib/ocf/resource.d/heartbeat/Route stop; echo $?
should delete the default route and result in 0

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should result in 7 (not started)

If this works not as expected, are the any error message?
Please see if you can debug the Route script.

Greetings,
I did all these tests, and all results came back normal. First monitorreturned 7, not started, after starting, returned 0 and monitorreturned 0, stop returned 0, and monitor after stopping returned 7.
Seems the error for me is further up initiallly which causes it to notstart afterwards. Here's the current setup:
primitive intIP ocf:heartbeat:IPaddr2 \
params ip="192.168.0.1" cidr_netmask="16"broadcast="192.168.255.255" nic="lan0"
primitive extIP ocf:heartbeat:IPaddr2 \
params ip="24.227.124.158" cidr_netmask="30"broadcast="24.227.124.159" nic="net0"
primitive resRoute ocf:heartbeat:Route \
        params destination="0.0.0.0/0" gateway="24.227.124.157" \
primitive firewall lsb:shorewall
group router_cluster extIP intIP resRoute shorewall
location router-master router_cluster \
rule $id="router-master-rule" $role="master" 100: #uname eqrouter1
I have added blank lines in the logs to separate out specific eventsegments that shows it. One in particular, near the top specificallyis what's causing the entire resRoute to fail completely:
Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRMoperation resRoute_monitor_0 (call=4, rc=5, cib-update=31,confirmed=true) complete not installed
This is with OpenSUSE 11.1 with the ha-cluster repository used withpacemaker 1.0.5, cluster-glue 1.0, heartbeat 3.0.0, openais 0.80.5,ha-resources 1.0 (which is heartbeat 3.99.x stuff I do believe). Sofairly current versions now.
I'd been making my setup off of susestudio and hand picking thepackages needed.
Any thoughts?

--
Eric Renfro


Aha!

The problem is in the Route script itself somewhere. In doing the sametests you exampled earlier, the very first monitor attempt on Route,while the net0 interface is empty and offline, I get the error that wasshown in the previous log snipped:


Route[26705]: ERROR: Gateway address 24.227.124.157 is unreachable.

So the problem is, Route fails with an incorrect error code when it'sjust that it can't create a route because the interface is currentlyoffline. It should report 7 because it's not started.

After looking again at:http://hg.linux-ha.org/agents/log/56b9100f9c49/heartbeat/Routeand then finding out ocf_is_probe was non-existant, looking at:http://hg.linux-ha.org/agents/file/56b9100f9c49/heartbeat/.ocf-shellfuncs.in

I was able to patch together a fix that worked. The script of shellfuncsin that example didn't work with the -a $OCF_RESKEY_CRM_meta_intervalbecause of too many arguments, but omitting that part entirely resolvedthe issue overall. It successfully brings the route up right from the start.



Now that that issue is resolved for the time being....

How would I make it once it takes down the route from resRoute whenpassing control back to the master server, it activates an alternativeroute to get itself back online through the 192.168.0.1 gateway? I don'teven know where to begin to get this logic in place. All I know is ithas something to do with colocation but how exactly I'm uncertain. Anyadvice and examples would be grateful.


--
Eric Renfro


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Help with Pacemaker 2-node Router Setup

Reply via email to