Re: [Pacemaker] Failed start of a resource after a Debian upgrading

Michal Vyoral Wed, 25 Jan 2012 07:40:21 -0800

Hi Dejan,

On Tue, Jan 24, 2012 at 11:52:20PM +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Tue, Jan 24, 2012 at 06:31:54PM +0100, Michal Vyoral wrote:
> > Hello,
> > we had a cluster of two nodes both running Debian 5.0, each with two 
> > resources
> > IPaddr2 and apache managed by pacemaker 1.0.9.1. After an upgrading of
> > one node from Debian 5.0 to 6.0 we have a problem to start the
> > apache resource on the upgraded node. Here are the details:
> > 
> > Versions of heartbeat and pacemaker before the upgrade:
> > pr-iso1:~# dpkg -l pacemaker heartbeat
> > Desired=Unknown/Install/Remove/Purge/Hold
> > | 
> > Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> > |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: 
> > uppercase=bad)
> > ||/ Name           Version        Description
> > +++-==============-==============-============================================
> > ii  heartbeat      1:3.0.3-2~bpo5 Subsystem for High-Availability Linux
> > ii  pacemaker      1.0.9.1+hg1562 HA cluster resource manager
> > 
> > Versions of heartbeat and pacemaker after the upgrade:
> > pr-iso2:~# dpkg -l pacemaker heartbeat
> > Desired=Unknown/Install/Remove/Purge/Hold
> > | 
> > Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> > ||/ Name           Version        Description
> > +++-==============-==============-============================================
> > ii  heartbeat      1:3.0.3-2      Subsystem for High-Availability Linux
> > ii  pacemaker      1.0.9.1+hg1562 HA cluster resource manager
> > 
> > Status of the resources on the upgraded node:
> > pr-iso2:~# crm_mon
> > ============
> > Last updated: Tue Jan 24 10:14:12 2012
> > Stack: Heartbeat
> > Current DC: pr-iso2 (511079a9-0f71-4537-bdf9-07714b454441) - partition with 
> > quorum
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pr-iso2 ]
> > OFFLINE: [ pr-iso1 ]
> > 
> > ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso2
> > 
> > Failed actions:
> >     RTWeb_start_0 (node=pr-iso2, call=7, rc=1, status=complete): unknown 
> > error
> > 
> > Status of the resources on the non upgraded node:
> > pr-iso1:~# crm_mon
> > ============
> > Last updated: Tue Jan 24 17:08:22 2012
> > Stack: Heartbeat
> > Current DC: pr-iso1 (014268aa-f234-4789-b4a1-0053cf4e61b9) - partition with 
> > quor
> > um
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pr-iso1 pr-iso2 ]
> > 
> > ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso1
> > RTWeb   (ocf::heartbeat:apache):        Started pr-iso1
> > 
> > Configuration of the resources:
> > pr-iso1:~# crm configure show
> > node $id="014268aa-f234-4789-b4a1-0053cf4e61b9" pr-iso1
> > node $id="511079a9-0f71-4537-bdf9-07714b454441" pr-iso2
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >         params ip="10.5.75.83" cidr_netmask="24" \
> >         op monitor interval="30s"
> > primitive RTWeb ocf:heartbeat:apache \
> >         params configfile="/etc/apache2/apache2.conf" \
> >         op monitor interval="1min" \
> >         meta target-role="Started" is-managed="true"
> > colocation website-with-ip inf: RTWeb ClusterIP
> > order rtweb_after_clustrip inf: ClusterIP RTWeb
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> >         cluster-infrastructure="Heartbeat" \
> >         stonith-enabled="false" \
> >         last-lrm-refresh="1327399494"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> > 
> > Records in the /var/log/ha-log related to RTWeb resource:
> > pr-iso2:~# grep RTWeb /var/log/ha-log
> > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing 
> > key=7:76:7:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_monitor_0 )
> > Jan 24 10:04:56 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:4: probe
> > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: process_lrm_event: LRM 
> > operation RTWeb_monitor_0 (call=4, rc=7, cib-update=13, confirmed=true) not 
> > running
> > Jan 24 10:12:48 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing 
> > key=11:77:0:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_start_0 )
> > Jan 24 10:12:48 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:7: start
> 
> After this message there should be a bit more (look for "apache"
> or "lrmd"). Next resource agents are going to log the resource
> name too (RTWeb in this case). If you cannot find anything here,
> then the answer must be in the apache logs.
> 
> Thanks,
> 
> Dejan


Yes, you are right: here are two more lines after the previous line:

  apache[9454]:   2012/01/24_10:12:49 INFO: apache not running
  apache[9454]:   2012/01/24_10:12:49 INFO: waiting for apache 
/etc/apache2/apache2.conf to come up

There are no records in /var/log/apache2/error.log giving some clue, see:

  pr-iso2:/var/log/apache2# cat error.log
  [Tue Jan 24 11:12:50 2012] [notice] Apache/2.2.16 (Debian) 
PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured 
-- resuming normal operations
  [Tue Jan 24 11:13:08 2012] [notice] caught SIGTERM, shutting down
  [Wed Jan 25 13:09:02 2012] [notice] Apache/2.2.16 (Debian) 
PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured 
-- resuming normal operations
  [Wed Jan 25 13:09:21 2012] [notice] caught SIGTERM, shutting down

See the interesting thing: our nodes shold use UTC time, but after the upgrade
we have noticed, that the time on the upgraded node is our local time (= UTC + 
1)
I have return the system time back to UTC, but Apache still uses the local time 
in the log. 

We have tried to start the Apache on the upgraded node alone:

1. we have modified the file /etc/apache2/ports2.conf to
Apache listen on the physical address
2. we have run the command '/etc/init.d/apache2 start'
3. we have download an index.html page

Here is the record in the error log:

 [Wed Jan 25 13:28:11 2012] [notice] Apache/2.2.16 (Debian) 
PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured 
-- resuming normal operations
 [Wed Jan 25 13:28:27 2012] [warn] [client 10.5.77.29] incomplete redirection 
target of '/rt/' for URI '/' modified to 'http://10.5.75.82/rt/'

So, Apache alone could run. 

Before the upgrade we have made some minor changes to apache2.conf
on the active node, but not on the passive node. We have return 
the changes back, but the resource is stil failed, see the tail from th ha-log
on the upgraded node:

Jan 25 14:04:11 pr-iso2 heartbeat: [16026]: WARN: node pr-iso1: is dead
Jan 25 14:04:11 pr-iso2 heartbeat: [16026]: info: Link pr-iso1:eth0 dead.
Jan 25 14:04:11 pr-iso2 crmd: [16043]: notice: crmd_ha_status_callback: Status 
update: Node pr-iso1 now has status [dead] (DC=false)
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crm_update_peer_proc: pr-iso1.ais 
is now offline
Jan 25 14:04:11 pr-iso2 ccm: [16038]: info: Break tie for 2 nodes cluster
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: no mbr_track info
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: no mbr_track info
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: instance=3, 
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: cib_ccm_msg_callback: Processing 
CCM event=NEW MEMBERSHIP (id=3)
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: instance=3, 
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: crm_update_peer: Node pr-iso1: id=0 
state=lost (new) addr=(null) votes=-1 born=1 seen=2 proc=00000000000000000000
000000000202
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crmd_ccm_msg_callback: Quorum 
(re)attained after event=NEW MEMBERSHIP (id=3)
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail: NEW MEMBERSHIP: 
trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail:  CURRENT: 
pr-iso2 [nodeid=1, born=3]
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail:  LOST:    
pr-iso1 [nodeid=0, born=1]
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crm_update_peer: Node pr-iso1: 
id=0 state=lost (new) addr=(null) votes=-1 born=1 seen=2 
proc=0000000000000000000
0000000000000
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL ori
gin=do_election_check ]
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: do_te_control: Registering TE 
UUID: 6fe179f3-2f76-4b99-b53a-b449b0bf3008
Jan 25 14:04:11 pr-iso2 crmd: [16043]: WARN: cib_client_add_notify_callback: 
Callback already present
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: set_graph_functions: Setting 
custom graph functions
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 
-1: 0 actions in 0 synapses
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: start_subsystem: Starting 
sub-system "pengine"
Jan 25 14:04:11 pr-iso2 pengine: [16392]: info: Invoked: 
/usr/lib/heartbeat/pengine 
Jan 25 14:04:11 pr-iso2 pengine: [16392]: info: main: Starting pengine
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: do_dc_takeover: Taking over DC 
status for this partition
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_readwrite: We are now 
in R/W mode
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_master for section 'all' (origin=local/crmd/17, 
version=0.63.9): 
ok (rc=0)
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_modify for section cib (origin=local/crmd/18, version=0.63.9): 
ok
 (rc=0)
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_modify for section crm_config (origin=local/crmd/20, 
version=0.63
.9): ok (rc=0)
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: join_make_offer: Making join 
offers based on membership 3
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: do_dc_join_offer_all: join-1: 
Waiting on 1 outstanding join acks
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: te_connect_stonith: Attempting 
connection to fencing daemon...
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_modify for section crm_config (origin=local/crmd/22, 
version=0.63.9): ok (rc=0)
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: te_connect_stonith: Connected
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: config_query_callback: Checking 
for expired actions every 900000ms
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: update_dc: Set DC to pr-iso2 
(3.0.1)
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster 
nodes responded to the join offer.
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_dc_join_finalize: join-1: 
Syncing the CIB from pr-iso2 to the rest of the cluster
Jan 25 14:04:15 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_sync for section 'all' (origin=local/crmd/25, version=0.63.9): 
ok (rc=0)
Jan 25 14:04:15 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_modify for section nodes (origin=local/crmd/26, 
version=0.63.9): ok (rc=0)
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: do_dc_join_ack: join-1: Updating 
node state to member for pr-iso2
Jan 25 14:04:17 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_delete for section //node_state[@uname='pr-iso2']/lrm 
(origin=local/crmd/27, version=0.63.10): ok (rc=0)
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: erase_xpath_callback: Deletion of 
"//node_state[@uname='pr-iso2']/lrm": ok (rc=0)
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: populate_cib_nodes_ha: Requesting 
the list of configured nodes
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster 
nodes are eligible to run resources.
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_dc_join_final: Ensuring DC, 
quorum and node attributes are up-to-date
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: crm_update_quorum: Updating quorum 
status to true (call=31)
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_local_callback: Sending 
full refresh (origin=crmd)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: 
do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending 
flush op to all hosts for: shutdown (<null>)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 32: Requesting 
the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_modify for section nodes (origin=local/crmd/29, 
version=0.63.11): ok (rc=0)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: WARN: match_down_event: No match for 
shutdown action on 014268aa-f234-4789-b4a1-0053cf4e61b9
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: te_update_diff: Stonith/shutdown 
of 014268aa-f234-4789-b4a1-0053cf4e61b9 not matched
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: 
te_update_diff:191 - Triggered transition abort (complete=1, tag=node_state, 
id=014268aa-f234-4789-b4a1-0053cf4e61b9, magic=NA, cib=0.63.12) : Node failure
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 33: Requesting 
the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: log_data_element: cib:diff: - <cib 
dc-uuid="014268aa-f234-4789-b4a1-0053cf4e61b9" admin_epoch="0" epoch="63" 
num_updates="12" />
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: log_data_element: cib:diff: + <cib 
dc-uuid="511079a9-0f71-4537-bdf9-07714b454441" admin_epoch="0" epoch="64" 
num_updates="1" />
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: cib_process_request: Operation 
complete: op cib_modify for section cib (origin=local/crmd/31, version=0.64.1): 
ok (rc=0)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: 
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: need_abort: Aborting on change to 
admin_epoch
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 34: Requesting 
the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-RTWeb (INFINITY)
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 
22: fail-count-RTWeb=INFINITY
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-RTWeb (1327500241)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking 
the PE: query=34, ref=pe_calc-dc-1327500258-12, seq=3, quorate=1
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 
25: last-failure-RTWeb=1327500241
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: determine_online_status: Node 
pr-iso2 is online
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending 
flush op to all hosts for: terminate (<null>)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing 
failed op RTWeb_start_0 on pr-iso2: unknown error (1)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP       
(ocf::heartbeat:IPaddr2):       Started pr-iso2
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: 
te_update_diff:146 - Triggered transition abort (complete=1, 
tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, 
cib=0.64.2) : Transient attribute: update
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: RTWeb   
(ocf::heartbeat:apache):        Started pr-iso2 FAILED
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending 
flush op to all hosts for: probe_complete (true)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: 
te_update_diff:146 - Triggered transition abort (complete=1, 
tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, 
cib=0.64.3) : Transient attribute: update
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: RecurringOp:  Start recurring 
monitor (60s) for RTWeb on pr-iso2
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 35: Requesting 
the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource 
ClusterIP  (Started pr-iso2)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Recover resource 
RTWeb    (Started pr-iso2)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 36: Requesting 
the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: handle_response: pe_calc 
calculation pe_calc-dc-1327500258-12 is obsolete
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking 
the PE: query=36, ref=pe_calc-dc-1327500258-13, seq=3, quorate=1
Jan 25 14:04:18 pr-iso2 cib: [16405]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-90.raw
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 
0: PEngine Input stored in: /var/lib/pengine/pe-input-6324.bz2
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: determine_online_status: Node 
pr-iso2 is online
Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing 
failed op RTWeb_start_0 on pr-iso2: unknown error (1)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP       
(ocf::heartbeat:IPaddr2):       Started pr-iso2
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: RTWeb   
(ocf::heartbeat:apache):        Started pr-iso2 FAILED
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed 
INFINITY times on pr-iso2
Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: common_apply_stickiness: 
Forcing RTWeb away from pr-iso2 after 1000000 failures (max=1000000)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: native_merge_weights: 
ClusterIP: Rolling back scores from RTWeb
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: native_color: Resource RTWeb 
cannot run anywhere
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource 
ClusterIP  (Started pr-iso2)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Stop resource 
RTWeb       (pr-iso2)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 
1: 2 actions in 2 synapses
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_te_invoke: Processing graph 1 
(ref=pe_calc-dc-1327500258-13) derived from /var/lib/pengine/pe-input-6325.bz2
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: te_rsc_command: Initiating action 
1: stop RTWeb_stop_0 on pr-iso2 (local)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_lrm_rsc_op: Performing 
key=1:1:0:6fe179f3-2f76-4b99-b53a-b449b0bf3008 op=RTWeb_stop_0 )
Jan 25 14:04:18 pr-iso2 lrmd: [16040]: info: rsc:RTWeb:8: stop
Jan 25 14:04:18 pr-iso2 cib: [16405]: info: write_cib_contents: Wrote version 
0.64.0 of the CIB to disk (digest: b1b8107c826e0e9f9d7e0dc5d7f17d77)
Jan 25 14:04:18 pr-iso2 cib: [16405]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.0nYAMe (digest: 
/var/lib/heartbeat/crm/cib.2WriED)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 
1: PEngine Input stored in: /var/lib/pengine/pe-input-6325.bz2
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_ha_callback: flush message 
from pr-iso2
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_ha_callback: flush message 
from pr-iso2
Jan 25 14:04:19 pr-iso2 lrmd: [16040]: info: RA output: (RTWeb:stop:stderr) 
kill: 830: No such process
apache[16406]:  2012/01/25_14:04:19 INFO: Killing apache PID 16356
apache[16406]:  2012/01/25_14:04:19 INFO: apache stopped.
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation 
RTWeb_stop_0 (call=8, rc=0, cib-update=37, confirmed=true) ok
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: match_graph_event: Action 
RTWeb_stop_0 (1) confirmed on pr-iso2 (rc=0)
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: te_pseudo_action: Pseudo action 3 
fired and confirmed
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: run_graph: 
====================================================
Jan 25 14:04:19 pr-iso2 crmd: [16043]: notice: run_graph: Transition 1 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-6325.bz2): Complete
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: te_graph_trigger: Transition 1 is 
now complete
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: notify_crmd: Transition 1 status: 
done - <null>
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: do_state_transition: Starting 
PEngine Recheck Timer
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing 
/usr/lib/heartbeat/crmd process group 16043 with signal 15
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crm_signal_dispatch: Invoking 
handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crm_shutdown: Requesting shutdown
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN 
origin=crm_shutdown ]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster 
nodes are eligible to run resources.
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown_req: Sending shutdown 
request to DC: pr-iso2
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: handle_shutdown_request: Creating 
shutdown request for pr-iso2 (state=S_POLICY_ENGINE)
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending 
flush op to all hosts for: shutdown (1327500299)
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 
33: shutdown=1327500299
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: abort_transition_graph: 
te_update_diff:146 - Triggered transition abort (complete=1, 
tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, 
cib=0.64.6) : Transient attribute: update
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 39: Requesting 
the current CIB: S_POLICY_ENGINE
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking 
the PE: query=39, ref=pe_calc-dc-1327500299-16, seq=3, quorate=1
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: determine_online_status: Node 
pr-iso2 is shutting down
Jan 25 14:04:59 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing 
failed op RTWeb_start_0 on pr-iso2: unknown error (1)
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP       
(ocf::heartbeat:IPaddr2):       Started pr-iso2
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: native_print: RTWeb   
(ocf::heartbeat:apache):        Stopped 
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed 
INFINITY times on pr-iso2
Jan 25 14:04:59 pr-iso2 pengine: [16392]: WARN: common_apply_stickiness: 
Forcing RTWeb away from pr-iso2 after 1000000 failures (max=1000000)
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_merge_weights: 
ClusterIP: Rolling back scores from RTWeb
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_color: Resource 
ClusterIP cannot run anywhere
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_color: Resource RTWeb 
cannot run anywhere
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: stage6: Scheduling Node pr-iso2 
for shutdown
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: LogActions: Stop resource 
ClusterIP   (pr-iso2)
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource 
RTWeb      (Stopped)
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 
2: 3 actions in 3 synapses
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_invoke: Processing graph 2 
(ref=pe_calc-dc-1327500299-16) derived from /var/lib/pengine/pe-input-6326.bz2
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_rsc_command: Initiating action 
5: stop ClusterIP_stop_0 on pr-iso2 (local)
Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: cancel_op: operation monitor[6] on 
ocf::IPaddr2::ClusterIP for client 16043, its parameters: 
CRM_meta_name=[monitor] cidr_netmask=[24] crm_feature_set=[3.0.1] 
CRM_meta_timeout=[20000] CRM_meta_interval=[30000] ip=[10.5.75.83]  cancelled
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_lrm_rsc_op: Performing 
key=5:2:0:6fe179f3-2f76-4b99-b53a-b449b0bf3008 op=ClusterIP_stop_0 )
Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: rsc:ClusterIP:9: stop
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation 
ClusterIP_monitor_30000 (call=6, status=1, cib-update=0, confirmed=true) 
Cancelled
IPaddr2[16548]: 2012/01/25_14:04:59 INFO: IP status = ok, IP_CIP=
IPaddr2[16548]: 2012/01/25_14:04:59 INFO: ip -f inet addr delete 10.5.75.83/24 
dev eth0
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation 
ClusterIP_stop_0 (call=9, rc=0, cib-update=40, confirmed=true) ok
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: match_graph_event: Action 
ClusterIP_stop_0 (5) confirmed on pr-iso2 (rc=0)
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_pseudo_action: Pseudo action 2 
fired and confirmed
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_crm_command: Executing 
crm-event (8): do_shutdown on pr-iso2
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_crm_command: crm-event (8) is a 
local shutdown
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: run_graph: 
====================================================
Jan 25 14:04:59 pr-iso2 crmd: [16043]: notice: run_graph: Transition 2 
(Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-6326.bz2): Complete
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_graph_trigger: Transition 2 is 
now complete
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_dc_release: DC role released
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to 
pengine: [16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_control: Transitioner is now 
inactive
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_control: Disconnecting 
STONITH...
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: 
tengine_stonith_connection_destroy: Fencing daemon disconnected
Jan 25 14:04:59 pr-iso2 crmd: [16043]: notice: Not currently connected.
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Terminating the 
pengine
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to 
pengine: [16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Waiting for 
subsystems to exit
Jan 25 14:04:59 pr-iso2 crmd: [16043]: WARN: register_fsa_input_adv: 
do_shutdown stalled the FSA with pending inputs
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems 
stopped, continuing
Jan 25 14:04:59 pr-iso2 crmd: [16043]: WARN: do_log: FSA: Input 
I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Terminating the 
pengine
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to 
pengine: [16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Waiting for 
subsystems to exit
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems 
stopped, continuing
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 
2: PEngine Input stored in: /var/lib/pengine/pe-input-6326.bz2
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: crm_signal_dispatch: Invoking 
handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crmdManagedChildDied: Process 
pengine:[16392] exited (signal=0, exitcode=0)
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: pe_msg_dispatch: Received HUP from 
pengine:[16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: pe_connection_destroy: Connection 
to the Policy Engine released
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems 
stopped, continuing
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_lrm_control: Disconnected from 
the LRM
Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: client (pid=16043) removed from ccm
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_ha_control: Disconnected from 
Heartbeat
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_cib_control: Disconnecting CIB
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crmd_cib_connection_destroy: 
Connection to the CIB terminated...
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_exit: Performing A_EXIT_0 - 
gracefully exiting the CRMd
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: free_mem: Dropping I_TERMINATE: [ 
state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_process_readwrite: We are now 
in R/O mode
Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: send_ipc_message: IPC Channel to 
16043 is not connected
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_exit: [crmd] stopped (0)
Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: send_via_callback_channel: Delivery 
of reply to client 16043/74e07029-b533-4c17-bac2-a0a52149bc7a failed
Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: do_local_notify: A-Sync reply to 
crmd failed: reply failed
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing 
/usr/lib/heartbeat/attrd process group 16042 with signal 15
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: crm_signal_dispatch: Invoking 
handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_shutdown: Exiting
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: main: Exiting...
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_cib_connection_destroy: 
Connection to the CIB terminated...
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing 
/usr/lib/heartbeat/stonithd process group 16041 with signal 15
Jan 25 14:04:59 pr-iso2 stonithd: [16041]: notice: /usr/lib/heartbeat/stonithd 
normally quit.
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing 
/usr/lib/heartbeat/lrmd -r process group 16040 with signal 15
Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: lrmd is shutting down
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing 
/usr/lib/heartbeat/cib process group 16039 with signal 15
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: crm_signal_dispatch: Invoking 
handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_shutdown: Disconnected 0 clients
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_process_disconnect: All clients 
disconnected...
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: terminate_cib: initiate_exit: 
Disconnecting heartbeat
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: terminate_cib: Exiting...
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: main: Done
Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: client (pid=16039) removed from ccm
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing 
/usr/lib/heartbeat/ccm process group 16038 with signal 15
Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: received SIGTERM, going to shut down
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBFIFO process 16031 
with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBWRITE process 16032 
with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBREAD process 16033 
with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBWRITE process 16034 
with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBREAD process 16035 
with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16033 exited. 5 
remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16034 exited. 4 
remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16031 exited. 3 
remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16035 exited. 2 
remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16032 exited. 1 
remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: pr-iso2 Heartbeat shutdown 
complete.

May be the solution is to upgrade the active node and reconfigure the 
pacemaker? Thank you.

Best regards,
Michal Vyoral

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Failed start of a resource after a Debian upgrading

Reply via email to