Hello all.
Another problem.
Just find out what one of my clone resource are not work and pacemakers
not see this - it is says what all cones are started. If i run status
from console - all is ok.
I still can`t understand how to fix it.
I attached log from DC with really strange problems.
My config:
node mysender31.example.com
node mysender38.example.com
node mysender39.example.com
node mysender6.example.com
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.6.1.214" cidr_netmask="32" nic="eth0:0" \
op monitor interval="15" timeout="30" on-fail="restart"
primitive cleardb_delete_history_old.init lsb:cleardb_delete_history_old \
op monitor interval="15" timeout="30" on-fail="restart" \
meta target-role="Started"
primitive gettopupdated.init lsb:gettopupdate-my \
op monitor interval="15" timeout="30" on-fail="restart"
primitive onlineconf.init lsb:onlineconf \
op monitor interval="15"
primitive qm_manager.init lsb:qm_manager \
op monitor interval="15" timeout="30" on-fail="restart" \
meta target-role="Started"
primitive qm_master.init lsb:qm_master \
op monitor interval="15" timeout="30" on-fail="restart"
primitive silverbox-stat.1.init lsb:silverbox-stat.1 \
op monitor interval="15" timeout="30" on-fail="restart" \
meta target-role="Started"
clone gettopupdated.clone gettopupdated.init
clone onlineconf.clone onlineconf.init
clone qm_master.clone qm_master.init \
meta clone-max="2"
location CLEARDB_RUNS_ONLY_ON_MS6 cleardb_delete_history_old.init \
rule $id="CLEARDB_RUNS_ONLY_ON_MS6-rule" -inf: #uname ne
mysender6.example.com
location QM-PREFER-MS39 qm_manager.init 100: mysender39.example.com
location QM_MASTER_DENY_MS38 qm_master.clone -inf: mysender38.example.com
location QM_MASTER_DENY_MS39 qm_master.clone -inf: mysender39.example.com
location SILVERBOX-STAT_RUNS_ONLY_ON_MS38 silverbox-stat.1.init \
rule $id="SILVERBOX-STAT_RUNS_ONLY_ON_MS38-rule" -inf: #uname
ne mysender38.example.com
colocation QM-IP inf: ClusterIP qm_manager.init
order IP-Before-Qm inf: ClusterIP qm_manager.init
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="4" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1308909119"
--
Best regards,
Proskurin Kirill
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: determine_online_status: Node mysender38.example.com is online
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: determine_online_status: Node mysender31.example.com is online
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: determine_online_status: Node mysender39.example.com is online
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: determine_online_status: Node mysender6.example.com is online
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: WARN: unpack_rsc_op: Processing failed op onlineconf.init:2_monitor_5000 on mysender38.example.com: not runni
ng (7)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation ClusterIP_monitor_0 found resource ClusterIP active on mysender38.mail.r
u
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation gettopupdated.init:3_monitor_0 found resource gettopupdated.init:3 activ
e on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation silverbox-stat.1.init_monitor_0 found resource silverbox-stat.1.init act
ive on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation qm_master.init:0_monitor_0 found resource qm_master.init:0 active on mys
ender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation cleardb_delete_history_old.init_monitor_0 found resource cleardb_delete_
history_old.init active on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation qm_master.init:1_monitor_0 found resource qm_master.init:1 active on mys
ender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation cleardb_delete_history_old.init_monitor_0 found resource cleardb_delete_
history_old.init active on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation onlineconf.init:1_monitor_0 found resource onlineconf.init:1 active on m
ysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: WARN: unpack_rsc_op: Processing failed op onlineconf.init:1_monitor_5000 on mysender31.example.com: not runni
ng (7)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation qm_manager.init_monitor_0 found resource qm_manager.init active on mysen
der39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation cleardb_delete_history_old.init_monitor_0 found resource cleardb_delete_
history_old.init active on mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation gettopupdated.init:1_monitor_0 found resource gettopupdated.init:1 activ
e on mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: find_clone: Internally renamed onlineconf.init:2 on mysender39.example.com to onlineconf.init:3
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation onlineconf.init:2_monitor_0 found resource onlineconf.init:3 active on m
ysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation ClusterIP_monitor_0 found resource ClusterIP active on mysender39.mail.r
u
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: WARN: unpack_rsc_op: Processing failed op onlineconf.init:3_monitor_5000 on mysender39.example.com: not runni
ng (7)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: WARN: unpack_rsc_op: Processing failed op onlineconf.init:3_stop_0 on mysender39.example.com: unknown error (
1)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: native_add_running: resource onlineconf.init:3 isnt managed
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: WARN: unpack_rsc_op: Processing failed op onlineconf.init:0_monitor_5000 on mysender6.example.com: not runnin
g (7)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: unpack_rsc_op: Operation cleardb_delete_history_old.init_monitor_0 found resource cleardb_delete_
history_old.init active on mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: clone_print: Clone Set: onlineconf.clone
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: native_print: onlineconf.init:2 (lsb:onlineconf): Started mysender38.example.com FA
ILED
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: native_print: onlineconf.init:3 (lsb:onlineconf): Started mysender39.example.com (u
nmanaged) FAILED
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: short_print: Stopped: [ onlineconf.init:0 onlineconf.init:1 ]
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: clone_print: Clone Set: gettopupdated.clone
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: short_print: Started: [ mysender6.example.com mysender39.example.com mysender31.example.com mysender38.m
ail.ru ]
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: native_print: ClusterIP (ocf::heartbeat:IPaddr2): Started mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: native_print: qm_manager.init (lsb:qm_manager): Started mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: clone_print: Clone Set: qm_master.clone
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: short_print: Started: [ mysender6.example.com mysender31.example.com ]
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: native_print: silverbox-stat.1.init (lsb:silverbox-stat.1): Started mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: native_print: cleardb_delete_history_old.init (lsb:cleardb_delete_history_old): Start
ed mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 30026 times on mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 969974 more times on mysender6.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 30026 times on mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 969974 more times on mysender6.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 30026 times on mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 969974 more times on mysender6.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 30026 times on mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 969974 more times on mysender6.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44825 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 955175 more times on mysender31.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44825 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 955175 more times on mysender31.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44825 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 955175 more times on mysender31.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44825 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 955175 more times on mysender31.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: gettopupdated.clone has failed 6 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: gettopupdated.clone can fail 999994 more times on mysender31.example.com bef
ore being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: gettopupdated.clone has failed 6 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: gettopupdated.clone can fail 999994 more times on mysender31.example.com bef
ore being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: gettopupdated.clone has failed 6 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: gettopupdated.clone can fail 999994 more times on mysender31.example.com bef
ore being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: gettopupdated.clone has failed 6 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: gettopupdated.clone can fail 999994 more times on mysender31.example.com bef
ore being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: qm_master.clone has failed 8 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: qm_master.clone can fail 999992 more times on mysender31.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: qm_master.clone has failed 8 times on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: qm_master.clone can fail 999992 more times on mysender31.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 47892 times on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 952108 more times on mysender38.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 47892 times on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 952108 more times on mysender38.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 47892 times on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 952108 more times on mysender38.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 47892 times on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 952108 more times on mysender38.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: silverbox-stat.1.init has failed 8 times on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: silverbox-stat.1.init can fail 999992 more times on mysender38.example.com b
efore being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44 times on mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 999956 more times on mysender39.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44 times on mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 999956 more times on mysender39.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44 times on mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 999956 more times on mysender39.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: get_failcount: onlineconf.clone has failed 44 times on mysender39.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: common_apply_stickiness: onlineconf.clone can fail 999956 more times on mysender39.example.com before
being forced off
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: info: native_color: Unmanaged resource onlineconf.init:3 allocated to 'nowhere': failed
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: RecurringOp: Start recurring monitor (5s) for onlineconf.init:0 on mysender6.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: RecurringOp: Start recurring monitor (5s) for onlineconf.init:1 on mysender31.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: RecurringOp: Start recurring monitor (5s) for onlineconf.init:2 on mysender38.example.com
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Start onlineconf.init:0 (mysender6.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Start onlineconf.init:1 (mysender31.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Recover resource onlineconf.init:2 (Started mysender38.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource onlineconf.init:3 (Started unmanaged)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource gettopupdated.init:0 (Started mysender6.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource gettopupdated.init:1 (Started mysender39.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource gettopupdated.init:2 (Started mysender31.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource gettopupdated.init:3 (Started mysender38.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource ClusterIP (Started mysender39.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource qm_manager.init (Started mysender39.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource qm_master.init:0 (Started mysender6.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource qm_master.init:1 (Started mysender31.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource silverbox-stat.1.init (Started mysender38.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: notice: LogActions: Leave resource cleardb_delete_history_old.init (Started mysender6.example.com)
Jun 24 11:27:40 mysender6.example.com pengine: [23744]: WARN: should_dump_input: Ignoring requirement that onlineconf.init:3_stop_0 comeplete before onlineconf.c
lone_stopped_0: unmanaged failed resources cannot prevent clone shutdown
Jun 24 11:27:40 mysender6.example.com crmd: [23745]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS caus
e=C_IPC_MESSAGE origin=handle_response ]
Jun 24 11:27:40
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker