Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro: > Michael Schwartzkopff wrote: > > Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro: > >> Michael Schwartzkopff wrote: > >>> Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro: > >>>> Hello, > >>>> > >>>> I'm trying to setup 2 nodes that'll run pacemaker with openais as the > >>>> communication layer. Ideally what I want is for router1 to be the > >>>> master node and take over for router2 if it comes back up fully > >>>> functional again. In my setup, the routers are both internet-facing > >>>> servers that toggle the external internet IP to whichever controls it > >>>> at the time, and also handles the internal IP for the gateway for > >>>> internal systems to route via. > >>>> > >>>> My problem is with Route in my setup, so far, and later getting > >>>> shorewall to start/stop per whichever nodes active. > >>>> > >>>> Route, in my case in the setup I will show below, is failing to start > >>>> initially because I presume the internet IP address is not fully > >>>> initialized at the time it's trying to enable the route. If I do a crm > >>>> resource cleanup failover-gw, it brings it up just fine. If I try to > >>>> move the router_cluster resource to router2 from router1 after it's > >>>> fully up, it fails because of failover-gw on router2. > >>> > >>> Very unlikely. If the IPaddr2 script finishes the IP address is up. > >>> Please search for other reasons and grep "lrm.*failover-gw" in the > >>> logs. > >>> > >>>> Here's my setup at present. For the moment, until I figure out how to > >>>> do it, shorewall is started manually, I want to automate this once the > >>>> setup is working, though, perhaps you guys could help me with that as > >>>> well. > >>>> > >>>> primitive failover-int-ip ocf:heartbeat:IPaddr2 \ > >>>> params ip="192.168.0.1" \ > >>>> op monitor interval="2s" > >>>> primitive failover-ext-ip ocf:heartbeat:IPaddr2 \ > >>>> params ip="24.227.124.158" cidr_netmask="30" > >>>> broadcast="24.227.124.159" nic="net0" \ > >>>> op monitor interval="2s" \ > >>>> meta target-role="Started" > >>>> primitive failover-gw ocf:heartbeat:Route \ > >>>> params destination="0.0.0.0/0" gateway="24.227.124.157" > >>>> device="net0" \ > >>>> meta target-role="Started" \ > >>>> op monitor interval="2s" > >>>> group router_cluster failover-int-ip failover-ext-ip failover-gw > >>>> location router-master router_cluster \ > >>>> rule $id="router-master-rule" $role="master" 100: #uname eq > >>>> router1 > >>>> > >>>> I would appreciate as much help as possible. I am fairly new to > >>>> pacemaker, but so far all but the Route part of this works well. > >>> > >>> Please give us a chance to help you providing the interesting logs! > >> > >> Sure.. > >> Here's a big clip of a log grepped from just failover-gw, if this helps > >> hopefully, else, I can pinpoint more around what's happening, the logs > >> fill up pretty quickly as it's coming alive. > >> > >> messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op: > >> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of > >> the expected value: 7 (not running) > > > > (...) > > > > The rest of the logs is not needed. Just the first line tells you that > > that something is not installed correctly. Please read the lines just > > abobe this line. Normally it tells you what is missing. > > > > You also your read trough the routing resource agent in > > /usr/lib/ocf/resource.d/heartbeat/Route > > > > Greetings, > > Hmmm.. > I'm not seeing anything about it, here's a clip of the above lines, and > one line below the one saying (not installed). > > Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status: > Node router1 is online > Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op: > failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect > ed value: 7 (not running) > Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation > failover-gw_monitor_0 found resource failover-gw active on r > outer1 > Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status: > Node router2 is online > Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op: > failover-gw_monitor_0 on router2 returned 5 (not installed) instead of > the expected value: 7 (not running) > Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard > error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw > from re-starting on router2
Hi, there must be other log entries. In the Router RA I have before err out the agent write reasons into the ocf_log(). What version of pacemaker and cluster- glue do you have? What distribution you a running on? Greetings, -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker