[Pacemaker] first monitor action after start of ressource fails - ends up in ressource restart

Bauer, Stefan (IZLBW Extern) Mon, 16 Jun 2014 23:31:19 -0700

Dear Users/Developers,

I'm running a pacemaker/corosync cluster on Debian 7:


Pacemaker 1.1.7.1
Corosync 1.4.2-3

Everything is smooth but the first monitor action after the start action on my 
apache2 ressource fails - hence it ends up in a restart.

How can this be avoided?

Log attached.

Thanks

Stefan

Jun 17 08:16:49 node1 crmd: [5544]: info: te_rsc_command: Initiating action 41: 
start p_apache_start_0 on node1 (local)
Jun 17 08:16:49 node1 lrmd: [5541]: info: rsc:p_apache start[32] (pid 19175)
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout) 
Starting web server: apache2
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout) .
Jun 17 08:16:49 node1 lrmd: [5541]: info: operation start[32] on p_apache for 
client 5544: pid 19175 exited with return code 0
Jun 17 08:16:49 node1 crmd: [5544]: info: process_lrm_event: LRM operation 
p_apache_start_0 (call=32, rc=0, cib-update=58, confirmed=true) ok
Jun 17 08:16:49 node1 crmd: [5544]: info: te_rsc_command: Initiating action 42: 
monitor p_apache_monitor_10000 on node1 (local)
Jun 17 08:16:49 node1 lrmd: [5541]: info: rsc:p_apache monitor[33] (pid 19224)
Jun 17 08:16:49 node1 lrmd: [5541]: info: operation monitor[33] on p_apache for 
client 5544: pid 19224 exited with return code 7 (mapped from 3)
Jun 17 08:16:49 node1 crmd: [5544]: info: process_lrm_event: LRM operation 
p_apache_monitor_10000 (call=33, rc=7, cib-update=59, confirmed=false) not 
running
Jun 17 08:16:49 node1 crmd: [5544]: WARN: status_from_rc: Action 42 
(p_apache_monitor_10000) on node1 failed (target: 0 vs. rc: 7): Error
Jun 17 08:16:49 node1 crmd: [5544]: WARN: update_failcount: Updating failcount 
for p_apache on node1 after failed monitor: rc=7 (update=value++, 
time=1402985809)
Jun 17 08:16:49 node1 crmd: [5544]: info: abort_transition_graph: 
match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, 
id=p_apache_last_failure_0, 
magic=0:7;42:0:0:2b25e917-fb1e-45fa-a377-2c08f4a76d26, cib=0.199.104) : Event 
failed
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-p_apache (1)
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_perform_update: Sent update 
27: fail-count-p_apache=1
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-p_apache (1402985809)
Jun 17 08:16:49 node1 crmd: [5544]: info: abort_transition_graph: 
te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, 
id=status-node1-fail-count-p_apache, name=fail-count-p_apache, value=1, 
magic=NA, cib=0.199.105) : Transient attribute: update
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_perform_update: Sent update 
30: last-failure-p_apache=1402985809
Jun 17 08:16:49 node1 crmd: [5544]: info: abort_transition_graph: 
te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, 
id=status-node1-last-failure-p_apache, name=last-failure-p_apache, 
value=1402985809, magic=NA, cib=0.199.106) : Transient attribute: update
Jun 17 08:16:49 node1 pengine: [5543]: WARN: unpack_rsc_op: Processing failed 
op p_apache_last_failure_0 on node1: not running (7)
Jun 17 08:16:49 node1 pengine: [5543]: notice: common_apply_stickiness: 
p_apache can fail 999999 more times on node2 before being forced off
Jun 17 08:16:49 node1 pengine: [5543]: notice: common_apply_stickiness: 
p_apache can fail 999999 more times on node1 before being forced off
Jun 17 08:16:49 node1 pengine: [5543]: notice: LogActions: Recover 
p_apache#011(Started node1)
Jun 17 08:16:49 node1 crmd: [5544]: info: te_rsc_command: Initiating action 2: 
stop p_apache_stop_0 on node1 (local)
Jun 17 08:16:49 node1 lrmd: [5541]: info: cancel_op: operation monitor[33] on 
p_apache for client 5544, its parameters: crm_feature_set=[3.0.6] depth=[0] 
CRM_meta_name=[monitor] CRM_meta_interval=[10000] CRM_meta_timeout=[20000] 
CRM_meta_depth=[0]  cancelled
Jun 17 08:16:49 node1 lrmd: [5541]: info: rsc:p_apache stop[36] (pid 19258)
Jun 17 08:16:49 node1 crmd: [5544]: info: process_lrm_event: LRM operation 
p_apache_monitor_10000 (call=33, status=1, cib-update=0, confirmed=true) 
Cancelled
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:stop:stdout) 
Stopping web server: apache2
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:stop:stdout)  
... waiting
Jun 17 08:16:50 node1 lrmd: [5541]: info: RA output: (p_apache:stop:stdout) .
Jun 17 08:16:50 node1 lrmd: [5541]: info: operation stop[36] on p_apache for 
client 5544: pid 19258 exited with return code 0
Jun 17 08:16:50 node1 crmd: [5544]: info: process_lrm_event: LRM operation 
p_apache_stop_0 (call=36, rc=0, cib-update=64, confirmed=true) ok
Jun 17 08:16:50 node1 crmd: [5544]: info: te_rsc_command: Initiating action 46: 
start p_apache_start_0 on node1 (local)
Jun 17 08:16:50 node1 lrmd: [5541]: info: rsc:p_apache start[37] (pid 19282)
Jun 17 08:16:50 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout) 
Starting web server: apache2
Jun 17 08:16:50 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout) .
Jun 17 08:16:50 node1 lrmd: [5541]: info: operation start[37] on p_apache for 
client 5544: pid 19282 exited with return code 0
Jun 17 08:16:50 node1 crmd: [5544]: info: process_lrm_event: LRM operation 
p_apache_start_0 (call=37, rc=0, cib-update=65, confirmed=true) ok
Jun 17 08:16:50 node1 crmd: [5544]: info: te_rsc_command: Initiating action 1: 
monitor p_apache_monitor_10000 on node1 (local)
Jun 17 08:16:50 node1 lrmd: [5541]: info: rsc:p_apache monitor[38] (pid 19295)
Jun 17 08:16:50 node1 lrmd: [5541]: info: operation monitor[38] on p_apache for 
client 5544: pid 19295 exited with return code 0
Jun 17 08:16:50 node1 crmd: [5544]: info: process_lrm_event: LRM operation 
p_apache_monitor_10000 (call=38, rc=0, cib-update=66, confirmed=false) ok

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] first monitor action after start of ressource fails - ends up in ressource restart

Reply via email to