Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 11:55:57 schrieb Eric Renfro:
Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro:
Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
Michael Schwartzkopff wrote:
Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
Hello,

I'm trying to setup 2 nodes that'll run pacemaker with openais as
the communication layer. Ideally what I want is for router1 to be
the master node and take over for router2 if it comes back up fully
functional again. In my setup, the routers are both internet-facing
servers that toggle the external internet IP to whichever controls
it at the time, and also handles the internal IP for the gateway for
internal systems to route via.

My problem is with Route in my setup, so far, and later getting
shorewall to start/stop per whichever nodes active.

Route, in my case in the setup I will show below, is failing to
start initially because I presume the internet IP address is not
fully initialized at the time it's trying to enable the route. If I
do a crm resource cleanup failover-gw, it brings it up just fine. If
I try to move the router_cluster resource to router2 from router1
after it's fully up, it fails because of failover-gw on router2.
Very unlikely. If the IPaddr2 script finishes the IP address is up.
Please search for other reasons and grep "lrm.*failover-gw" in the
logs.

Here's my setup at present. For the moment, until I figure out how
to do it, shorewall is started manually, I want to automate this
once the setup is working, though, perhaps you guys could help me
with that as well.

primitive failover-int-ip ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.1" \
        op monitor interval="2s"
primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
        params ip="24.227.124.158" cidr_netmask="30"
broadcast="24.227.124.159" nic="net0" \
        op monitor interval="2s" \
        meta target-role="Started"
primitive failover-gw ocf:heartbeat:Route \
        params destination="0.0.0.0/0" gateway="24.227.124.157"
device="net0" \
        meta target-role="Started" \
        op monitor interval="2s"
group router_cluster failover-int-ip failover-ext-ip failover-gw
location router-master router_cluster \
        rule $id="router-master-rule" $role="master" 100: #uname eq
router1

I would appreciate as much help as possible. I am fairly new to
pacemaker, but so far all but the Route part of this works well.
Please give us a chance to help you providing the interesting logs!
Sure..
Here's a big clip of a log grepped from just failover-gw, if this
helps hopefully, else, I can pinpoint more around what's happening,
the logs fill up pretty quickly as it's coming alive.

messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
the expected value: 7 (not running)
(...)

The rest of the logs is not needed. Just the first line tells you that
that something is not installed correctly. Please read the lines just
abobe this line. Normally it tells you what is missing.

You also your read trough the routing resource agent in
/usr/lib/ocf/resource.d/heartbeat/Route

Greetings,
Hmmm..
I'm not seeing anything about it, here's a clip of the above lines, and
one line below the one saying (not installed).

Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
Node router1 is online
Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect
ed value: 7 (not running)
Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation
failover-gw_monitor_0 found resource failover-gw active on r
outer1
Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
Node router2 is online
Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
 the expected value: 7 (not running)
Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard
error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw
from re-starting on router2
Hi,

there must be other log entries. In the Router RA I have before err out
the agent write reasons into the ocf_log(). What version of pacemaker and
cluster- glue do you have? What distribution you a running on?

Greetings,
I've checked all my logs. Syslog logs everything to my messages logfile,
so it should be there if anywhere.

I'm running OpenSUSE 11.2 which comes with heartbeat 2.99.3, pacemaker
1.0.1, openais 0.80.3, as to what all's running in this setup.

Hm. This is already a quite old verison of pacemaker. But it should run anyway. Please could you check the resource manually on router1.

export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_destination="0.0.0.0/0"
export OCF_RESKEY_gateway="24.227.124.157"

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should reult in 0 (started) or 7 (not started)

/usr/lib/ocf/resource.d/heartbeat/Route start; echo $?
should add the default route and result in 0

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should result in 0 (started)

/usr/lib/ocf/resource.d/heartbeat/Route stop; echo $?
should delete the default route and result in 0

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should result in 7 (not started)

If this works not as expected, are the any error message?
Please see if you can debug the Route script.

Greetings,

I did all these tests, and all results came back normal. First monitor returned 7, not started, after starting, returned 0 and monitor returned 0, stop returned 0, and monitor after stopping returned 7.

Seems the error for me is further up initiallly which causes it to not start afterwards. Here's the current setup:

primitive intIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.1" cidr_netmask="16" broadcast="192.168.255.255" 
nic="lan0"
primitive extIP ocf:heartbeat:IPaddr2 \
        params ip="24.227.124.158" cidr_netmask="30" broadcast="24.227.124.159" 
nic="net0"
primitive resRoute ocf:heartbeat:Route \
        params destination="0.0.0.0/0" gateway="24.227.124.157" \
primitive firewall lsb:shorewall
group router_cluster extIP intIP resRoute shorewall
location router-master router_cluster \
        rule $id="router-master-rule" $role="master" 100: #uname eq router1

And here's the log:

Dec 27 00:23:39 router2 mgmtd: [25787]: debug: main: run the loop...
Dec 27 00:23:39 router2 mgmtd: [25787]: info: Started.
Dec 27 00:23:39 router2 cib: [25790]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.RqbSTT (digest: /var/lib/heartbeat/crm/cib.CNwZjy) Dec 27 00:24:37 router2 crmd: [25786]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped! Dec 27 00:24:37 router2 crmd: [25786]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ] Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Dec 27 00:24:37 router2 crmd: [25786]: info: do_te_control: Registering TE UUID: 804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 Dec 27 00:24:37 router2 crmd: [25786]: WARN: cib_client_add_notify_callback: Callback already present Dec 27 00:24:37 router2 crmd: [25786]: info: set_graph_functions: Setting custom graph functions Dec 27 00:24:37 router2 crmd: [25786]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Dec 27 00:24:37 router2 crmd: [25786]: info: do_dc_takeover: Taking over DC status for this partition Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_readwrite: We are now in R/W mode Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/6, version=0.135.0): ok (rc=0) Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/7, version=0.135.0): ok (rc=0) Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/9, version=0.135.0): ok (rc=0) Dec 27 00:24:37 router2 crmd: [25786]: info: join_make_offer: Making join offers based on membership 2796 Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/11, version=0.135.0): ok (rc=0) Dec 27 00:24:37 router2 crmd: [25786]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks Dec 27 00:24:37 router2 crmd: [25786]: info: ais_dispatch: Membership 2796: quorum still lost Dec 27 00:24:37 router2 crmd: [25786]: info: config_query_callback: Checking for expired actions every 900000ms Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/14, version=0.135.0): ok (rc=0) Dec 27 00:24:37 router2 crmd: [25786]: info: update_dc: Set DC to router2 (3.0.1) Dec 27 00:24:37 router2 crmd: [25786]: info: ais_dispatch: Membership 2796: quorum still lost Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/17, version=0.135.0): ok (rc=0) Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: All 1 cluster nodes responded to the join offer. Dec 27 00:24:37 router2 crmd: [25786]: info: do_dc_join_finalize: join-1: Syncing the CIB from router2 to the rest of the cluster Dec 27 00:24:37 router2 crmd: [25786]: info: te_connect_stonith: Attempting connection to fencing daemon... Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/18, version=0.135.0): ok (rc=0)
Dec 27 00:24:38 router2 crmd: [25786]: info: te_connect_stonith: Connected
Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/19, version=0.135.0): ok (rc=0) Dec 27 00:24:38 router2 crmd: [25786]: info: update_attrd: Connecting to attrd... Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_delete for section //node_sta...@uname='router2']/transient_attributes (origin=local/crmd/20, version=0.135.0): ok (rc=0) Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_delete for section //node_sta...@uname='router2']/lrm (origin=local/crmd/21, version=0.135.0): ok (rc=0) Dec 27 00:24:38 router2 attrd: [25784]: info: find_hash_entry: Creating hash entry for terminate Dec 27 00:24:38 router2 attrd: [25784]: info: find_hash_entry: Creating hash entry for shutdown Dec 27 00:24:38 router2 crmd: [25786]: info: erase_xpath_callback: Deletion of "//node_sta...@uname='router2']/transient_attributes": ok (rc=0) Dec 27 00:24:38 router2 crmd: [25786]: info: erase_xpath_callback: Deletion of "//node_sta...@uname='router2']/lrm": ok (rc=0) Dec 27 00:24:38 router2 crmd: [25786]: info: do_dc_join_ack: join-1: Updating node state to member for router2 Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_delete for section //node_sta...@uname='router2']/lrm (origin=local/crmd/22, version=0.135.0): ok (rc=0) Dec 27 00:24:39 router2 crmd: [25786]: info: erase_xpath_callback: Deletion of "//node_sta...@uname='router2']/lrm": ok (rc=0) Dec 27 00:24:39 router2 crmd: [25786]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ] Dec 27 00:24:39 router2 crmd: [25786]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Dec 27 00:24:39 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/24, version=0.135.1): ok (rc=0) Dec 27 00:24:39 router2 crmd: [25786]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date Dec 27 00:24:39 router2 crmd: [25786]: info: crm_update_quorum: Updating quorum status to false (call=26) Dec 27 00:24:39 router2 attrd: [25784]: info: attrd_local_callback: Sending full refresh (origin=crmd) Dec 27 00:24:39 router2 cib: [25782]: info: log_data_element: cib:diff: - <cib admin_epoch="0" epoch="135" num_updates="1" /> Dec 27 00:24:39 router2 crmd: [25786]: info: abort_transition_graph: do_te_invoke:190 - Triggered transition abort (complete=1) : Peer Cancelled Dec 27 00:24:39 router2 attrd: [25784]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>) Dec 27 00:24:39 router2 cib: [25782]: info: log_data_element: cib:diff: + <cib dc-uuid="router2" admin_epoch="0" epoch="136" num_updates="1" /> Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke: Query 27: Requesting the current CIB: S_POLICY_ENGINE Dec 27 00:24:39 router2 cib: [25782]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/26, version=0.136.1): ok (rc=0) Dec 27 00:24:39 router2 crmd: [25786]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change Dec 27 00:24:39 router2 attrd: [25784]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>) Dec 27 00:24:39 router2 crmd: [25786]: info: need_abort: Aborting on change to admin_epoch Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1261891479-7, seq=2796, quorate=0 Dec 27 00:24:39 router2 pengine: [25785]: notice: unpack_config: On loss of CCM Quorum: Ignore Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke: Query 28: Requesting the current CIB: S_POLICY_ENGINE Dec 27 00:24:39 router2 pengine: [25785]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1261891479-8, seq=2796, quorate=0 Dec 27 00:24:39 router2 cib: [26644]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-33.raw Dec 27 00:24:39 router2 pengine: [25785]: info: determine_online_status: Node router2 is online Dec 27 00:24:39 router2 pengine: [25785]: notice: group_print: Resource Group: router_cluster

Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: extIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: intIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: resRoute (ocf::heartbeat:Route): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: firewall (lsb:shorewall): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start extIP (router2) Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start intIP (router2) Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start resRoute (router2) Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start firewall (router2)

Dec 27 00:24:39 router2 crmd: [25786]: info: handle_response: pe_calc calculation pe_calc-dc-1261891479-7 is obsolete Dec 27 00:24:39 router2 cib: [26644]: info: write_cib_contents: Wrote version 0.136.0 of the CIB to disk (digest: 9e93522d5931f5831f51236994e76c0f) Dec 27 00:24:39 router2 pengine: [25785]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/pengine/pe-input-31.bz2 Dec 27 00:24:39 router2 pengine: [25785]: notice: unpack_config: On loss of CCM Quorum: Ignore Dec 27 00:24:39 router2 pengine: [25785]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Dec 27 00:24:39 router2 pengine: [25785]: info: determine_online_status: Node router2 is online Dec 27 00:24:39 router2 pengine: [25785]: notice: group_print: Resource Group: router_cluster Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: extIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: intIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: resRoute (ocf::heartbeat:Route): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print: firewall (lsb:shorewall): Stopped Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start extIP (router2) Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start intIP (router2) Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start resRoute (router2) Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start firewall (router2) Dec 27 00:24:39 router2 crmd: [25786]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Dec 27 00:24:39 router2 cib: [26644]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.ciJ5oX (digest: /var/lib/heartbeat/crm/cib.zAvXPr) Dec 27 00:24:39 router2 crmd: [25786]: info: unpack_graph: Unpacked transition 1: 19 actions in 19 synapses Dec 27 00:24:39 router2 crmd: [25786]: info: do_te_invoke: Processing graph 1 (ref=pe_calc-dc-1261891479-8) derived from /var/lib/pengine/pe-input-32.bz2 Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating action 4: monitor extIP_monitor_0 on router2 (local) Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing key=4:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=extIP_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:extIP:2: monitor
Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating action 5: monitor intIP_monitor_0 on router2 (local) Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing key=5:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=intIP_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:intIP:3: monitor
Dec 27 00:24:39 router2 pengine: [25785]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-32.bz2 Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating action 6: monitor resRoute_monitor_0 on router2 (local) Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing key=6:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=resRoute_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:resRoute:4: monitor
Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating action 7: monitor firewall_monitor_0 on router2 (local) Dec 27 00:24:39 router2 Route[26705]: ERROR: Gateway address 24.227.124.157 is unreachable. Dec 27 00:24:39 router2 lrmd: [25783]: notice: lrmd_rsc_new(): No lrm_rprovider field in message Dec 27 00:24:39 router2 lrmd: [25783]: info: RA output: (resRoute:monitor:stderr) 2009/12/27_00:24:39 ERROR: Gateway address 24.227.124.157 is unreachable. Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing key=7:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=firewall_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:firewall:5: monitor

Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM operation extIP_monitor_0 (call=2, rc=7, cib-update=29, confirmed=true) complete not running Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM operation intIP_monitor_0 (call=3, rc=7, cib-update=30, confirmed=true) complete not running Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM operation resRoute_monitor_0 (call=4, rc=5, cib-update=31, confirmed=true) complete not installed Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM operation firewall_monitor_0 (call=5, rc=7, cib-update=32, confirmed=true) complete not running Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action extIP_monitor_0 (4) confirmed on router2 (rc=0) Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action intIP_monitor_0 (5) confirmed on router2 (rc=0) Dec 27 00:24:40 router2 crmd: [25786]: WARN: status_from_rc: Action 6 (resRoute_monitor_0) on router2 failed (target: 7 vs. rc: 5): Error Dec 27 00:24:40 router2 crmd: [25786]: info: abort_transition_graph: match_graph_event:272 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=resRoute_monitor_0, magic=0:5;6:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57, cib=0.136.4) : Event failed

Dec 27 00:24:40 router2 crmd: [25786]: info: update_abort_priority: Abort priority upgraded from 0 to 1 Dec 27 00:24:40 router2 crmd: [25786]: info: update_abort_priority: Abort action done superceeded by restart Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action resRoute_monitor_0 (6) confirmed on router2 (rc=4) Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action firewall_monitor_0 (7) confirmed on router2 (rc=0) Dec 27 00:24:40 router2 crmd: [25786]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on router2 (local) - no waiting Dec 27 00:24:40 router2 crmd: [25786]: info: run_graph: ==================================================== Dec 27 00:24:40 router2 attrd: [25784]: info: find_hash_entry: Creating hash entry for probe_complete Dec 27 00:24:40 router2 crmd: [25786]: notice: run_graph: Transition 1 (Complete=5, Pending=0, Fired=0, Skipped=14, Incomplete=0, Source=/var/lib/pengine/pe-input-32.bz2): Stopped Dec 27 00:24:40 router2 attrd: [25784]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Dec 27 00:24:40 router2 crmd: [25786]: info: te_graph_trigger: Transition 1 is now complete Dec 27 00:24:40 router2 attrd: [25784]: info: attrd_perform_update: Sent update 8: probe_complete=true Dec 27 00:24:40 router2 crmd: [25786]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Dec 27 00:24:40 router2 crmd: [25786]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke: Query 33: Requesting the current CIB: S_POLICY_ENGINE Dec 27 00:24:40 router2 crmd: [25786]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=router2, magic=NA, cib=0.136.6) : Transient attribute: update Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1261891480-14, seq=2796, quorate=0 Dec 27 00:24:40 router2 pengine: [25785]: notice: unpack_config: On loss of CCM Quorum: Ignore Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke: Query 34: Requesting the current CIB: S_POLICY_ENGINE Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1261891480-15, seq=2796, quorate=0 Dec 27 00:24:40 router2 pengine: [25785]: info: determine_online_status: Node router2 is online

Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_rsc_op: resRoute_monitor_0 on router2 returned 5 (not installed) instead of the expected value: 7 (not running) Dec 27 00:24:40 router2 pengine: [25785]: ERROR: unpack_rsc_op: Hard error - resRoute_monitor_0 failed with rc=5: Preventing resRoute from re-starting on router2

Dec 27 00:24:40 router2 pengine: [25785]: notice: group_print: Resource Group: router_cluster Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: extIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: intIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: resRoute (ocf::heartbeat:Route): Stopped Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: firewall (lsb:shorewall): Stopped Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: extIP: Rolling back scores from resRoute Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: intIP: Rolling back scores from resRoute Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: resRoute: Rolling back scores from firewall Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource resRoute cannot run anywhere Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource firewall cannot run anywhere Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start extIP (router2) Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start intIP (router2) Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave resource resRoute (Stopped) Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave resource firewall (Stopped) Dec 27 00:24:40 router2 crmd: [25786]: info: handle_response: pe_calc calculation pe_calc-dc-1261891480-14 is obsolete Dec 27 00:24:40 router2 pengine: [25785]: WARN: process_pe_message: Transition 2: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-27.bz2 Dec 27 00:24:40 router2 pengine: [25785]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Dec 27 00:24:40 router2 pengine: [25785]: notice: unpack_config: On loss of CCM Quorum: Ignore Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Dec 27 00:24:40 router2 pengine: [25785]: info: determine_online_status: Node router2 is online Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_rsc_op: resRoute_monitor_0 on router2 returned 5 (not installed) instead of the expected value: 7 (not running) Dec 27 00:24:40 router2 pengine: [25785]: ERROR: unpack_rsc_op: Hard error - resRoute_monitor_0 failed with rc=5: Preventing resRoute from re-starting on router2 Dec 27 00:24:40 router2 pengine: [25785]: notice: group_print: Resource Group: router_cluster Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: extIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: intIP (ocf::heartbeat:IPaddr2): Stopped Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: resRoute (ocf::heartbeat:Route): Stopped Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print: firewall (lsb:shorewall): Stopped Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: extIP: Rolling back scores from resRoute Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: intIP: Rolling back scores from resRoute Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: resRoute: Rolling back scores from firewall Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource resRoute cannot run anywhere Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource firewall cannot run anywhere Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start extIP (router2) Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start intIP (router2) Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave resource resRoute (Stopped) Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave resource firewall (Stopped) Dec 27 00:24:40 router2 crmd: [25786]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Dec 27 00:24:40 router2 crmd: [25786]: info: unpack_graph: Unpacked transition 3: 3 actions in 3 synapses Dec 27 00:24:40 router2 pengine: [25785]: WARN: process_pe_message: Transition 3: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-28.bz2 Dec 27 00:24:40 router2 crmd: [25786]: info: do_te_invoke: Processing graph 3 (ref=pe_calc-dc-1261891480-15) derived from /var/lib/pengine/pe-warn-28.bz2 Dec 27 00:24:40 router2 pengine: [25785]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Dec 27 00:24:40 router2 crmd: [25786]: info: te_pseudo_action: Pseudo action 6 fired and confirmed Dec 27 00:24:40 router2 crmd: [25786]: info: te_rsc_command: Initiating action 4: start extIP_start_0 on router2 (local) Dec 27 00:24:40 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing key=4:3:0:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=extIP_start_0 )
Dec 27 00:24:40 router2 lrmd: [25783]: info: rsc:extIP:6: start
Dec 27 00:24:40 router2 IPaddr2[26740]: INFO: ip -f inet addr add 24.227.124.158/30 brd 24.227.124.159 dev net0 Dec 27 00:24:40 router2 lrmd: [25783]: info: RA output: (extIP:start:stderr) 2009/12/27_00:24:40 INFO: ip -f inet addr add 24.227.124.158/30 brd 24.227.124.159 dev net0
Dec 27 00:24:40 router2 IPaddr2[26740]: INFO: ip link set net0 up
Dec 27 00:24:40 router2 lrmd: [25783]: info: RA output: (extIP:start:stderr) 2009/12/27_00:24:40 INFO: ip link set net0 up Dec 27 00:24:40 router2 IPaddr2[26740]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-24.227.124.158 net0 24.227.124.158 auto not_used not_used
Dec 27 00:24:40 router2 kernel: net0: link down
Dec 27 00:24:40 router2 lrmd: [25783]: info: RA output: (extIP:start:stderr) 2009/12/27_00:24:40 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-24.227.124.158 net0 24.227.124.158 auto not_used not_used

Dec 27 00:24:40 router2 kernel: ADDRCONF(NETDEV_UP): net0: link is not ready

Dec 27 00:24:41 router2 crmd: [25786]: info: process_lrm_event: LRM operation extIP_start_0 (call=6, rc=0, cib-update=35, confirmed=true) complete ok Dec 27 00:24:41 router2 crmd: [25786]: info: match_graph_event: Action extIP_start_0 (4) confirmed on router2 (rc=0) Dec 27 00:24:41 router2 crmd: [25786]: info: te_rsc_command: Initiating action 5: start intIP_start_0 on router2 (local) Dec 27 00:24:41 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing key=5:3:0:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=intIP_start_0 )
Dec 27 00:24:41 router2 lrmd: [25783]: info: rsc:intIP:7: start
Dec 27 00:24:41 router2 IPaddr2[26795]: INFO: ip -f inet addr add 192.168.0.1/16 brd 192.168.255.255 dev lan0 Dec 27 00:24:41 router2 lrmd: [25783]: info: RA output: (intIP:start:stderr) 2009/12/27_00:24:41 INFO: ip -f inet addr add 192.168.0.1/16 brd 192.168.255.255 dev lan0
Dec 27 00:24:41 router2 IPaddr2[26795]: INFO: ip link set lan0 up
Dec 27 00:24:41 router2 lrmd: [25783]: info: RA output: (intIP:start:stderr) 2009/12/27_00:24:41 INFO: ip link set lan0 up Dec 27 00:24:41 router2 IPaddr2[26795]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.1 lan0 192.168.0.1 auto not_used not_used Dec 27 00:24:41 router2 lrmd: [25783]: info: RA output: (intIP:start:stderr) 2009/12/27_00:24:41 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.1 lan0 192.168.0.1 auto not_used not_used Dec 27 00:24:41 router2 crmd: [25786]: info: process_lrm_event: LRM operation intIP_start_0 (call=7, rc=0, cib-update=36, confirmed=true) complete ok Dec 27 00:24:41 router2 crmd: [25786]: info: match_graph_event: Action intIP_start_0 (5) confirmed on router2 (rc=0) Dec 27 00:24:41 router2 crmd: [25786]: info: run_graph: ==================================================== Dec 27 00:24:41 router2 crmd: [25786]: notice: run_graph: Transition 3 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-warn-28.bz2): Complete Dec 27 00:24:41 router2 crmd: [25786]: info: te_graph_trigger: Transition 3 is now complete Dec 27 00:24:41 router2 crmd: [25786]: info: notify_crmd: Transition 3 status: done - <null> Dec 27 00:24:41 router2 crmd: [25786]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Dec 27 00:24:41 router2 crmd: [25786]: info: do_state_transition: Starting PEngine Recheck Timer

I have added blank lines in the logs to separate out specific event segments that shows it. One in particular, near the top specifically is what's causing the entire resRoute to fail completely:

Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM operation resRoute_monitor_0 (call=4, rc=5, cib-update=31, confirmed=true) complete not installed

This is with OpenSUSE 11.1 with the ha-cluster repository used with pacemaker 1.0.5, cluster-glue 1.0, heartbeat 3.0.0, openais 0.80.5, ha-resources 1.0 (which is heartbeat 3.99.x stuff I do believe). So fairly current versions now.

I'd been making my setup off of susestudio and hand picking the packages needed.

Any thoughts?

--
Eric Renfro


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to