Hi Dejan, On Tue, Jan 24, 2012 at 11:52:20PM +0100, Dejan Muhamedagic wrote: > Hi, > > On Tue, Jan 24, 2012 at 06:31:54PM +0100, Michal Vyoral wrote: > > Hello, > > we had a cluster of two nodes both running Debian 5.0, each with two > > resources > > IPaddr2 and apache managed by pacemaker 1.0.9.1. After an upgrading of > > one node from Debian 5.0 to 6.0 we have a problem to start the > > apache resource on the upgraded node. Here are the details: > > > > Versions of heartbeat and pacemaker before the upgrade: > > pr-iso1:~# dpkg -l pacemaker heartbeat > > Desired=Unknown/Install/Remove/Purge/Hold > > | > > Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend > > |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: > > uppercase=bad) > > ||/ Name Version Description > > +++-==============-==============-============================================ > > ii heartbeat 1:3.0.3-2~bpo5 Subsystem for High-Availability Linux > > ii pacemaker 1.0.9.1+hg1562 HA cluster resource manager > > > > Versions of heartbeat and pacemaker after the upgrade: > > pr-iso2:~# dpkg -l pacemaker heartbeat > > Desired=Unknown/Install/Remove/Purge/Hold > > | > > Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend > > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) > > ||/ Name Version Description > > +++-==============-==============-============================================ > > ii heartbeat 1:3.0.3-2 Subsystem for High-Availability Linux > > ii pacemaker 1.0.9.1+hg1562 HA cluster resource manager > > > > Status of the resources on the upgraded node: > > pr-iso2:~# crm_mon > > ============ > > Last updated: Tue Jan 24 10:14:12 2012 > > Stack: Heartbeat > > Current DC: pr-iso2 (511079a9-0f71-4537-bdf9-07714b454441) - partition with > > quorum > > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > > 2 Nodes configured, unknown expected votes > > 2 Resources configured. > > ============ > > > > Online: [ pr-iso2 ] > > OFFLINE: [ pr-iso1 ] > > > > ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso2 > > > > Failed actions: > > RTWeb_start_0 (node=pr-iso2, call=7, rc=1, status=complete): unknown > > error > > > > Status of the resources on the non upgraded node: > > pr-iso1:~# crm_mon > > ============ > > Last updated: Tue Jan 24 17:08:22 2012 > > Stack: Heartbeat > > Current DC: pr-iso1 (014268aa-f234-4789-b4a1-0053cf4e61b9) - partition with > > quor > > um > > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > > 2 Nodes configured, unknown expected votes > > 2 Resources configured. > > ============ > > > > Online: [ pr-iso1 pr-iso2 ] > > > > ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso1 > > RTWeb (ocf::heartbeat:apache): Started pr-iso1 > > > > Configuration of the resources: > > pr-iso1:~# crm configure show > > node $id="014268aa-f234-4789-b4a1-0053cf4e61b9" pr-iso1 > > node $id="511079a9-0f71-4537-bdf9-07714b454441" pr-iso2 > > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > > params ip="10.5.75.83" cidr_netmask="24" \ > > op monitor interval="30s" > > primitive RTWeb ocf:heartbeat:apache \ > > params configfile="/etc/apache2/apache2.conf" \ > > op monitor interval="1min" \ > > meta target-role="Started" is-managed="true" > > colocation website-with-ip inf: RTWeb ClusterIP > > order rtweb_after_clustrip inf: ClusterIP RTWeb > > property $id="cib-bootstrap-options" \ > > dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ > > cluster-infrastructure="Heartbeat" \ > > stonith-enabled="false" \ > > last-lrm-refresh="1327399494" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" > > > > Records in the /var/log/ha-log related to RTWeb resource: > > pr-iso2:~# grep RTWeb /var/log/ha-log > > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing > > key=7:76:7:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_monitor_0 ) > > Jan 24 10:04:56 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:4: probe > > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: process_lrm_event: LRM > > operation RTWeb_monitor_0 (call=4, rc=7, cib-update=13, confirmed=true) not > > running > > Jan 24 10:12:48 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing > > key=11:77:0:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_start_0 ) > > Jan 24 10:12:48 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:7: start > > After this message there should be a bit more (look for "apache" > or "lrmd"). Next resource agents are going to log the resource > name too (RTWeb in this case). If you cannot find anything here, > then the answer must be in the apache logs. > > Thanks, > > Dejan
Yes, you are right: here are two more lines after the previous line: apache[9454]: 2012/01/24_10:12:49 INFO: apache not running apache[9454]: 2012/01/24_10:12:49 INFO: waiting for apache /etc/apache2/apache2.conf to come up There are no records in /var/log/apache2/error.log giving some clue, see: pr-iso2:/var/log/apache2# cat error.log [Tue Jan 24 11:12:50 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations [Tue Jan 24 11:13:08 2012] [notice] caught SIGTERM, shutting down [Wed Jan 25 13:09:02 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations [Wed Jan 25 13:09:21 2012] [notice] caught SIGTERM, shutting down See the interesting thing: our nodes shold use UTC time, but after the upgrade we have noticed, that the time on the upgraded node is our local time (= UTC + 1) I have return the system time back to UTC, but Apache still uses the local time in the log. We have tried to start the Apache on the upgraded node alone: 1. we have modified the file /etc/apache2/ports2.conf to Apache listen on the physical address 2. we have run the command '/etc/init.d/apache2 start' 3. we have download an index.html page Here is the record in the error log: [Wed Jan 25 13:28:11 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations [Wed Jan 25 13:28:27 2012] [warn] [client 10.5.77.29] incomplete redirection target of '/rt/' for URI '/' modified to 'http://10.5.75.82/rt/' So, Apache alone could run. Before the upgrade we have made some minor changes to apache2.conf on the active node, but not on the passive node. We have return the changes back, but the resource is stil failed, see the tail from th ha-log on the upgraded node: Jan 25 14:04:11 pr-iso2 heartbeat: [16026]: WARN: node pr-iso1: is dead Jan 25 14:04:11 pr-iso2 heartbeat: [16026]: info: Link pr-iso1:eth0 dead. Jan 25 14:04:11 pr-iso2 crmd: [16043]: notice: crmd_ha_status_callback: Status update: Node pr-iso1 now has status [dead] (DC=false) Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crm_update_peer_proc: pr-iso1.ais is now offline Jan 25 14:04:11 pr-iso2 ccm: [16038]: info: Break tie for 2 nodes cluster Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: no mbr_track info Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: no mbr_track info Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm Jan 25 14:04:11 pr-iso2 cib: [16039]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=3) Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 Jan 25 14:04:11 pr-iso2 cib: [16039]: info: crm_update_peer: Node pr-iso1: id=0 state=lost (new) addr=(null) votes=-1 born=1 seen=2 proc=00000000000000000000 000000000202 Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=3) Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3 Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail: CURRENT: pr-iso2 [nodeid=1, born=3] Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail: LOST: pr-iso1 [nodeid=0, born=1] Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crm_update_peer: Node pr-iso1: id=0 state=lost (new) addr=(null) votes=-1 born=1 seen=2 proc=0000000000000000000 0000000000000 Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL ori gin=do_election_check ] Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: do_te_control: Registering TE UUID: 6fe179f3-2f76-4b99-b53a-b449b0bf3008 Jan 25 14:04:11 pr-iso2 crmd: [16043]: WARN: cib_client_add_notify_callback: Callback already present Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: set_graph_functions: Setting custom graph functions Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: start_subsystem: Starting sub-system "pengine" Jan 25 14:04:11 pr-iso2 pengine: [16392]: info: Invoked: /usr/lib/heartbeat/pengine Jan 25 14:04:11 pr-iso2 pengine: [16392]: info: main: Starting pengine Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: do_dc_takeover: Taking over DC status for this partition Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_readwrite: We are now in R/W mode Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/17, version=0.63.9): ok (rc=0) Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/18, version=0.63.9): ok (rc=0) Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/20, version=0.63 .9): ok (rc=0) Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: join_make_offer: Making join offers based on membership 3 Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: te_connect_stonith: Attempting connection to fencing daemon... Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/22, version=0.63.9): ok (rc=0) Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: te_connect_stonith: Connected Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: config_query_callback: Checking for expired actions every 900000ms Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: update_dc: Set DC to pr-iso2 (3.0.1) Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster nodes responded to the join offer. Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_dc_join_finalize: join-1: Syncing the CIB from pr-iso2 to the rest of the cluster Jan 25 14:04:15 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/25, version=0.63.9): ok (rc=0) Jan 25 14:04:15 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/26, version=0.63.9): ok (rc=0) Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: do_dc_join_ack: join-1: Updating node state to member for pr-iso2 Jan 25 14:04:17 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='pr-iso2']/lrm (origin=local/crmd/27, version=0.63.10): ok (rc=0) Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: erase_xpath_callback: Deletion of "//node_state[@uname='pr-iso2']/lrm": ok (rc=0) Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ] Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: populate_cib_nodes_ha: Requesting the list of configured nodes Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: crm_update_quorum: Updating quorum status to true (call=31) Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_local_callback: Sending full refresh (origin=crmd) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 32: Requesting the current CIB: S_POLICY_ENGINE Jan 25 14:04:18 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/29, version=0.63.11): ok (rc=0) Jan 25 14:04:18 pr-iso2 crmd: [16043]: WARN: match_down_event: No match for shutdown action on 014268aa-f234-4789-b4a1-0053cf4e61b9 Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: te_update_diff: Stonith/shutdown of 014268aa-f234-4789-b4a1-0053cf4e61b9 not matched Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:191 - Triggered transition abort (complete=1, tag=node_state, id=014268aa-f234-4789-b4a1-0053cf4e61b9, magic=NA, cib=0.63.12) : Node failure Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 33: Requesting the current CIB: S_POLICY_ENGINE Jan 25 14:04:18 pr-iso2 cib: [16039]: info: log_data_element: cib:diff: - <cib dc-uuid="014268aa-f234-4789-b4a1-0053cf4e61b9" admin_epoch="0" epoch="63" num_updates="12" /> Jan 25 14:04:18 pr-iso2 cib: [16039]: info: log_data_element: cib:diff: + <cib dc-uuid="511079a9-0f71-4537-bdf9-07714b454441" admin_epoch="0" epoch="64" num_updates="1" /> Jan 25 14:04:18 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/31, version=0.64.1): ok (rc=0) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: need_abort: Aborting on change to admin_epoch Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 34: Requesting the current CIB: S_POLICY_ENGINE Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-RTWeb (INFINITY) Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 22: fail-count-RTWeb=INFINITY Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-RTWeb (1327500241) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking the PE: query=34, ref=pe_calc-dc-1327500258-12, seq=3, quorate=1 Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 25: last-failure-RTWeb=1327500241 Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: determine_online_status: Node pr-iso2 is online Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>) Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing failed op RTWeb_start_0 on pr-iso2: unknown error (1) Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso2 Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, cib=0.64.2) : Transient attribute: update Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: RTWeb (ocf::heartbeat:apache): Started pr-iso2 FAILED Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, cib=0.64.3) : Transient attribute: update Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: RecurringOp: Start recurring monitor (60s) for RTWeb on pr-iso2 Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 35: Requesting the current CIB: S_POLICY_ENGINE Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource ClusterIP (Started pr-iso2) Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Recover resource RTWeb (Started pr-iso2) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 36: Requesting the current CIB: S_POLICY_ENGINE Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: handle_response: pe_calc calculation pe_calc-dc-1327500258-12 is obsolete Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking the PE: query=36, ref=pe_calc-dc-1327500258-13, seq=3, quorate=1 Jan 25 14:04:18 pr-iso2 cib: [16405]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-90.raw Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/pengine/pe-input-6324.bz2 Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: determine_online_status: Node pr-iso2 is online Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing failed op RTWeb_start_0 on pr-iso2: unknown error (1) Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso2 Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: RTWeb (ocf::heartbeat:apache): Started pr-iso2 FAILED Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed INFINITY times on pr-iso2 Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: common_apply_stickiness: Forcing RTWeb away from pr-iso2 after 1000000 failures (max=1000000) Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: native_merge_weights: ClusterIP: Rolling back scores from RTWeb Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: native_color: Resource RTWeb cannot run anywhere Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource ClusterIP (Started pr-iso2) Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Stop resource RTWeb (pr-iso2) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 1: 2 actions in 2 synapses Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_te_invoke: Processing graph 1 (ref=pe_calc-dc-1327500258-13) derived from /var/lib/pengine/pe-input-6325.bz2 Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: te_rsc_command: Initiating action 1: stop RTWeb_stop_0 on pr-iso2 (local) Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_lrm_rsc_op: Performing key=1:1:0:6fe179f3-2f76-4b99-b53a-b449b0bf3008 op=RTWeb_stop_0 ) Jan 25 14:04:18 pr-iso2 lrmd: [16040]: info: rsc:RTWeb:8: stop Jan 25 14:04:18 pr-iso2 cib: [16405]: info: write_cib_contents: Wrote version 0.64.0 of the CIB to disk (digest: b1b8107c826e0e9f9d7e0dc5d7f17d77) Jan 25 14:04:18 pr-iso2 cib: [16405]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.0nYAMe (digest: /var/lib/heartbeat/crm/cib.2WriED) Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-6325.bz2 Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_ha_callback: flush message from pr-iso2 Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_ha_callback: flush message from pr-iso2 Jan 25 14:04:19 pr-iso2 lrmd: [16040]: info: RA output: (RTWeb:stop:stderr) kill: 830: No such process apache[16406]: 2012/01/25_14:04:19 INFO: Killing apache PID 16356 apache[16406]: 2012/01/25_14:04:19 INFO: apache stopped. Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation RTWeb_stop_0 (call=8, rc=0, cib-update=37, confirmed=true) ok Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: match_graph_event: Action RTWeb_stop_0 (1) confirmed on pr-iso2 (rc=0) Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: te_pseudo_action: Pseudo action 3 fired and confirmed Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: run_graph: ==================================================== Jan 25 14:04:19 pr-iso2 crmd: [16043]: notice: run_graph: Transition 1 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-6325.bz2): Complete Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: te_graph_trigger: Transition 1 is now complete Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: notify_crmd: Transition 1 status: done - <null> Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: do_state_transition: Starting PEngine Recheck Timer Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/crmd process group 16043 with signal 15 Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crm_shutdown: Requesting shutdown Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown_req: Sending shutdown request to DC: pr-iso2 Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: handle_shutdown_request: Creating shutdown request for pr-iso2 (state=S_POLICY_ENGINE) Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1327500299) Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 33: shutdown=1327500299 Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, cib=0.64.6) : Transient attribute: update Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 39: Requesting the current CIB: S_POLICY_ENGINE Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking the PE: query=39, ref=pe_calc-dc-1327500299-16, seq=3, quorate=1 Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: determine_online_status: Node pr-iso2 is shutting down Jan 25 14:04:59 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing failed op RTWeb_start_0 on pr-iso2: unknown error (1) Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP (ocf::heartbeat:IPaddr2): Started pr-iso2 Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: native_print: RTWeb (ocf::heartbeat:apache): Stopped Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed INFINITY times on pr-iso2 Jan 25 14:04:59 pr-iso2 pengine: [16392]: WARN: common_apply_stickiness: Forcing RTWeb away from pr-iso2 after 1000000 failures (max=1000000) Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_merge_weights: ClusterIP: Rolling back scores from RTWeb Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_color: Resource ClusterIP cannot run anywhere Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_color: Resource RTWeb cannot run anywhere Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: stage6: Scheduling Node pr-iso2 for shutdown Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: LogActions: Stop resource ClusterIP (pr-iso2) Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource RTWeb (Stopped) Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 2: 3 actions in 3 synapses Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1327500299-16) derived from /var/lib/pengine/pe-input-6326.bz2 Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_rsc_command: Initiating action 5: stop ClusterIP_stop_0 on pr-iso2 (local) Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: cancel_op: operation monitor[6] on ocf::IPaddr2::ClusterIP for client 16043, its parameters: CRM_meta_name=[monitor] cidr_netmask=[24] crm_feature_set=[3.0.1] CRM_meta_timeout=[20000] CRM_meta_interval=[30000] ip=[10.5.75.83] cancelled Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_lrm_rsc_op: Performing key=5:2:0:6fe179f3-2f76-4b99-b53a-b449b0bf3008 op=ClusterIP_stop_0 ) Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: rsc:ClusterIP:9: stop Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation ClusterIP_monitor_30000 (call=6, status=1, cib-update=0, confirmed=true) Cancelled IPaddr2[16548]: 2012/01/25_14:04:59 INFO: IP status = ok, IP_CIP= IPaddr2[16548]: 2012/01/25_14:04:59 INFO: ip -f inet addr delete 10.5.75.83/24 dev eth0 Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation ClusterIP_stop_0 (call=9, rc=0, cib-update=40, confirmed=true) ok Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: match_graph_event: Action ClusterIP_stop_0 (5) confirmed on pr-iso2 (rc=0) Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_pseudo_action: Pseudo action 2 fired and confirmed Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_crm_command: Executing crm-event (8): do_shutdown on pr-iso2 Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_crm_command: crm-event (8) is a local shutdown Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: run_graph: ==================================================== Jan 25 14:04:59 pr-iso2 crmd: [16043]: notice: run_graph: Transition 2 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-6326.bz2): Complete Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_graph_trigger: Transition 2 is now complete Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_dc_release: DC role released Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to pengine: [16392] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_control: Transitioner is now inactive Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_control: Disconnecting STONITH... Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected Jan 25 14:04:59 pr-iso2 crmd: [16043]: notice: Not currently connected. Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Terminating the pengine Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to pengine: [16392] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Waiting for subsystems to exit Jan 25 14:04:59 pr-iso2 crmd: [16043]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems stopped, continuing Jan 25 14:04:59 pr-iso2 crmd: [16043]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Terminating the pengine Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to pengine: [16392] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Waiting for subsystems to exit Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems stopped, continuing Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/pengine/pe-input-6326.bz2 Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crmdManagedChildDied: Process pengine:[16392] exited (signal=0, exitcode=0) Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: pe_msg_dispatch: Received HUP from pengine:[16392] Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: pe_connection_destroy: Connection to the Policy Engine released Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems stopped, continuing Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_lrm_control: Disconnected from the LRM Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: client (pid=16043) removed from ccm Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_ha_control: Disconnected from Heartbeat Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_cib_control: Disconnecting CIB Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crmd_cib_connection_destroy: Connection to the CIB terminated... Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ] Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_process_readwrite: We are now in R/O mode Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: send_ipc_message: IPC Channel to 16043 is not connected Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_exit: [crmd] stopped (0) Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: send_via_callback_channel: Delivery of reply to client 16043/74e07029-b533-4c17-bac2-a0a52149bc7a failed Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/attrd process group 16042 with signal 15 Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_shutdown: Exiting Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: main: Exiting... Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_cib_connection_destroy: Connection to the CIB terminated... Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/stonithd process group 16041 with signal 15 Jan 25 14:04:59 pr-iso2 stonithd: [16041]: notice: /usr/lib/heartbeat/stonithd normally quit. Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/lrmd -r process group 16040 with signal 15 Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: lrmd is shutting down Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/cib process group 16039 with signal 15 Jan 25 14:04:59 pr-iso2 cib: [16039]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_shutdown: Disconnected 0 clients Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_process_disconnect: All clients disconnected... Jan 25 14:04:59 pr-iso2 cib: [16039]: info: terminate_cib: initiate_exit: Disconnecting heartbeat Jan 25 14:04:59 pr-iso2 cib: [16039]: info: terminate_cib: Exiting... Jan 25 14:04:59 pr-iso2 cib: [16039]: info: main: Done Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: client (pid=16039) removed from ccm Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/ccm process group 16038 with signal 15 Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: received SIGTERM, going to shut down Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBFIFO process 16031 with signal 15 Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBWRITE process 16032 with signal 15 Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBREAD process 16033 with signal 15 Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBWRITE process 16034 with signal 15 Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBREAD process 16035 with signal 15 Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16033 exited. 5 remaining Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16034 exited. 4 remaining Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16031 exited. 3 remaining Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16035 exited. 2 remaining Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16032 exited. 1 remaining Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: pr-iso2 Heartbeat shutdown complete. May be the solution is to upgrade the active node and reconfigure the pacemaker? Thank you. Best regards, Michal Vyoral _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org