Dear Users/Developers,
I'm running a pacemaker/corosync cluster on Debian 7:
Pacemaker 1.1.7.1
Corosync 1.4.2-3
Everything is smooth but the first monitor action after the start action on my
apache2 ressource fails - hence it ends up in a restart.
How can this be avoided?
Log attached.
Thanks
Stefan
Jun 17 08:16:49 node1 crmd: [5544]: info: te_rsc_command: Initiating action 41:
start p_apache_start_0 on node1 (local)
Jun 17 08:16:49 node1 lrmd: [5541]: info: rsc:p_apache start[32] (pid 19175)
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout)
Starting web server: apache2
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout) .
Jun 17 08:16:49 node1 lrmd: [5541]: info: operation start[32] on p_apache for
client 5544: pid 19175 exited with return code 0
Jun 17 08:16:49 node1 crmd: [5544]: info: process_lrm_event: LRM operation
p_apache_start_0 (call=32, rc=0, cib-update=58, confirmed=true) ok
Jun 17 08:16:49 node1 crmd: [5544]: info: te_rsc_command: Initiating action 42:
monitor p_apache_monitor_10000 on node1 (local)
Jun 17 08:16:49 node1 lrmd: [5541]: info: rsc:p_apache monitor[33] (pid 19224)
Jun 17 08:16:49 node1 lrmd: [5541]: info: operation monitor[33] on p_apache for
client 5544: pid 19224 exited with return code 7 (mapped from 3)
Jun 17 08:16:49 node1 crmd: [5544]: info: process_lrm_event: LRM operation
p_apache_monitor_10000 (call=33, rc=7, cib-update=59, confirmed=false) not
running
Jun 17 08:16:49 node1 crmd: [5544]: WARN: status_from_rc: Action 42
(p_apache_monitor_10000) on node1 failed (target: 0 vs. rc: 7): Error
Jun 17 08:16:49 node1 crmd: [5544]: WARN: update_failcount: Updating failcount
for p_apache on node1 after failed monitor: rc=7 (update=value++,
time=1402985809)
Jun 17 08:16:49 node1 crmd: [5544]: info: abort_transition_graph:
match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op,
id=p_apache_last_failure_0,
magic=0:7;42:0:0:2b25e917-fb1e-45fa-a377-2c08f4a76d26, cib=0.199.104) : Event
failed
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-p_apache (1)
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_perform_update: Sent update
27: fail-count-p_apache=1
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-p_apache (1402985809)
Jun 17 08:16:49 node1 crmd: [5544]: info: abort_transition_graph:
te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair,
id=status-node1-fail-count-p_apache, name=fail-count-p_apache, value=1,
magic=NA, cib=0.199.105) : Transient attribute: update
Jun 17 08:16:49 node1 attrd: [5542]: notice: attrd_perform_update: Sent update
30: last-failure-p_apache=1402985809
Jun 17 08:16:49 node1 crmd: [5544]: info: abort_transition_graph:
te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair,
id=status-node1-last-failure-p_apache, name=last-failure-p_apache,
value=1402985809, magic=NA, cib=0.199.106) : Transient attribute: update
Jun 17 08:16:49 node1 pengine: [5543]: WARN: unpack_rsc_op: Processing failed
op p_apache_last_failure_0 on node1: not running (7)
Jun 17 08:16:49 node1 pengine: [5543]: notice: common_apply_stickiness:
p_apache can fail 999999 more times on node2 before being forced off
Jun 17 08:16:49 node1 pengine: [5543]: notice: common_apply_stickiness:
p_apache can fail 999999 more times on node1 before being forced off
Jun 17 08:16:49 node1 pengine: [5543]: notice: LogActions: Recover
p_apache#011(Started node1)
Jun 17 08:16:49 node1 crmd: [5544]: info: te_rsc_command: Initiating action 2:
stop p_apache_stop_0 on node1 (local)
Jun 17 08:16:49 node1 lrmd: [5541]: info: cancel_op: operation monitor[33] on
p_apache for client 5544, its parameters: crm_feature_set=[3.0.6] depth=[0]
CRM_meta_name=[monitor] CRM_meta_interval=[10000] CRM_meta_timeout=[20000]
CRM_meta_depth=[0] cancelled
Jun 17 08:16:49 node1 lrmd: [5541]: info: rsc:p_apache stop[36] (pid 19258)
Jun 17 08:16:49 node1 crmd: [5544]: info: process_lrm_event: LRM operation
p_apache_monitor_10000 (call=33, status=1, cib-update=0, confirmed=true)
Cancelled
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:stop:stdout)
Stopping web server: apache2
Jun 17 08:16:49 node1 lrmd: [5541]: info: RA output: (p_apache:stop:stdout)
... waiting
Jun 17 08:16:50 node1 lrmd: [5541]: info: RA output: (p_apache:stop:stdout) .
Jun 17 08:16:50 node1 lrmd: [5541]: info: operation stop[36] on p_apache for
client 5544: pid 19258 exited with return code 0
Jun 17 08:16:50 node1 crmd: [5544]: info: process_lrm_event: LRM operation
p_apache_stop_0 (call=36, rc=0, cib-update=64, confirmed=true) ok
Jun 17 08:16:50 node1 crmd: [5544]: info: te_rsc_command: Initiating action 46:
start p_apache_start_0 on node1 (local)
Jun 17 08:16:50 node1 lrmd: [5541]: info: rsc:p_apache start[37] (pid 19282)
Jun 17 08:16:50 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout)
Starting web server: apache2
Jun 17 08:16:50 node1 lrmd: [5541]: info: RA output: (p_apache:start:stdout) .
Jun 17 08:16:50 node1 lrmd: [5541]: info: operation start[37] on p_apache for
client 5544: pid 19282 exited with return code 0
Jun 17 08:16:50 node1 crmd: [5544]: info: process_lrm_event: LRM operation
p_apache_start_0 (call=37, rc=0, cib-update=65, confirmed=true) ok
Jun 17 08:16:50 node1 crmd: [5544]: info: te_rsc_command: Initiating action 1:
monitor p_apache_monitor_10000 on node1 (local)
Jun 17 08:16:50 node1 lrmd: [5541]: info: rsc:p_apache monitor[38] (pid 19295)
Jun 17 08:16:50 node1 lrmd: [5541]: info: operation monitor[38] on p_apache for
client 5544: pid 19295 exited with return code 0
Jun 17 08:16:50 node1 crmd: [5544]: info: process_lrm_event: LRM operation
p_apache_monitor_10000 (call=38, rc=0, cib-update=66, confirmed=false) ok
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org