Hi,

I have 4 primitive resources that I was in the process of migrating from one node (lotus) to the other (aston). I was using the 'crm resource migrate' command (4 times). Normally this process works just fine, but in this instance, one of the resources (copan_shelf_07) got 'stuck', in that it didn't get started on the target node for about 15 minutes after the others (04, 05, 06), and then it seems it was kicked off by crm_timer_popped. I'm hoping someone can tell me why copan_shelf_07 took so long to be started. I have the complete hb_report if that is required. Here are the basics:

Corosync Cluster Engine, version '1.2.6' SVN revision 'exported'
Copyright (c) 2006-2009 Red Hat, Inc.
cluster-glue: 1.0.6 (ba146a145a3ede967af48e8936ac414984aa1e5f)
resource-agents: # Build version: d206bc8f13039b332e76a93a86e8e550b67781da
CRM Version: 1.1.2 (ecb1e2ea172ba2551f0bd763e557fccde68c849b)
pacemaker 1.1.2-0.7.1 - SUSE Linux Enterprise 11 x86_64
libpacemaker3 1.1.2-0.7.1 - SUSE Linux Enterprise 11 x86_64
openais 1.1.3-0.2.3 - SUSE Linux Enterprise 11 x86_64
corosync 1.2.6-0.2.2 - SUSE Linux Enterprise 11 x86_64
libcorosync4 1.2.6-0.2.2 - SUSE Linux Enterprise 11 x86_64
resource-agents 1.0.3-0.4.8 - SUSE Linux Enterprise 11 x86_64
cluster-glue 1.0.6-0.3.7 - SUSE Linux Enterprise 11 x86_64
libglue2 1.0.6-0.3.7 - SUSE Linux Enterprise 11 x86_64
ldirectord 1.0.3-0.4.8 - SUSE Linux Enterprise 11 x86_64

Here are the copan_shelf_07 syslog messages:

Mar 22 07:45:07 aston pengine: [10961]: notice: LogActions: Leave resource copan_shelf_07 (Started lotus) Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff: + <rsc_location id="cli-prefer-copan_shelf_07" rsc="copan_shelf_07" __crm_diff_marker__="added:top" > Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff: + <rule id="cli-prefer-rule-copan_shelf_07" score="INFINITY" boolean-op="and" > Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff: + <expression id="cli-prefer-expr-copan_shelf_07" attribute="#uname" operation="eq" value="aston" type="string" /> Mar 22 07:45:10 aston pengine: [10961]: notice: native_print: copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus Mar 22 07:45:10 aston pengine: [10961]: notice: RecurringOp: Start recurring monitor (30s) for copan_shelf_07 on aston Mar 22 07:45:10 aston pengine: [10961]: notice: LogActions: Move resource copan_shelf_07 (Started lotus -> aston) Mar 22 07:45:10 aston pengine: [10961]: notice: native_print: copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus Mar 22 07:45:10 aston pengine: [10961]: notice: RecurringOp: Start recurring monitor (30s) for copan_shelf_07 on aston Mar 22 07:45:10 aston pengine: [10961]: notice: LogActions: Move resource copan_shelf_07 (Started lotus -> aston) Mar 22 07:45:10 aston crmd: [10962]: info: te_rsc_command: Initiating action 17: stop copan_shelf_07_stop_0 on lotus Mar 22 07:45:12 aston crmd: [10962]: WARN: status_from_rc: Action 17 (copan_shelf_07_stop_0) on lotus failed (target: 0 vs. rc: 1): Error Mar 22 07:45:12 aston crmd: [10962]: WARN: update_failcount: Updating failcount for copan_shelf_07 on lotus after failed stop: rc=1 (update=INFINITY, time=1300797912) Mar 22 07:45:12 aston crmd: [10962]: info: abort_transition_graph: match_graph_event:276 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=copan_shelf_07_stop_0, magic=0:1;17:509:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d, cib=0.920.5) : Event failed Mar 22 07:45:12 aston crmd: [10962]: info: match_graph_event: Action copan_shelf_07_stop_0 (17) confirmed on lotus (rc=4) Mar 22 07:45:17 aston pengine: [10961]: WARN: unpack_rsc_op: Processing failed op copan_shelf_07_stop_0 on lotus: unknown error (1) Mar 22 07:45:17 aston pengine: [10961]: ERROR: unpack_rsc_op: Making sure copan_shelf_07 doesn't come up again Mar 22 07:45:17 aston pengine: [10961]: notice: native_print: copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus FAILED Mar 22 07:45:17 aston pengine: [10961]: info: get_failcount: copan_shelf_07 has failed INFINITY times on lotus Mar 22 07:45:17 aston pengine: [10961]: WARN: common_apply_stickiness: Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1) Mar 22 07:45:17 aston pengine: [10961]: info: rsc_merge_weights: cxfs-client-clone: Rolling back scores from copan_shelf_07 Mar 22 07:45:17 aston pengine: [10961]: info: native_color: Resource copan_shelf_07 cannot run anywhere Mar 22 07:45:17 aston pengine: [10961]: notice: LogActions: Stop resource copan_shelf_07 (lotus) Mar 22 07:45:17 aston pengine: [10961]: WARN: unpack_rsc_op: Processing failed op copan_shelf_07_stop_0 on lotus: unknown error (1) Mar 22 07:45:17 aston pengine: [10961]: ERROR: unpack_rsc_op: Making sure copan_shelf_07 doesn't come up again Mar 22 07:45:17 aston pengine: [10961]: notice: native_print: copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus FAILED Mar 22 07:45:17 aston pengine: [10961]: info: get_failcount: copan_shelf_07 has failed INFINITY times on lotus Mar 22 07:45:17 aston pengine: [10961]: WARN: common_apply_stickiness: Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1) Mar 22 07:45:17 aston pengine: [10961]: info: rsc_merge_weights: cxfs-client-clone: Rolling back scores from copan_shelf_07 Mar 22 07:45:17 aston pengine: [10961]: info: native_color: Resource copan_shelf_07 cannot run anywhere Mar 22 07:45:17 aston pengine: [10961]: notice: LogActions: Stop resource copan_shelf_07 (lotus) Mar 22 07:45:18 aston crmd: [10962]: info: te_rsc_command: Initiating action 11: stop copan_shelf_07_stop_0 on lotus Mar 22 07:45:19 aston crmd: [10962]: WARN: status_from_rc: Action 11 (copan_shelf_07_stop_0) on lotus failed (target: 0 vs. rc: 1): Error Mar 22 07:45:19 aston crmd: [10962]: WARN: update_failcount: Updating failcount for copan_shelf_07 on lotus after failed stop: rc=1 (update=INFINITY, time=1300797919) Mar 22 07:45:19 aston crmd: [10962]: info: abort_transition_graph: match_graph_event:276 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=copan_shelf_07_stop_0, magic=0:1;11:511:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d, cib=0.920.16) : Event failed Mar 22 07:45:19 aston crmd: [10962]: info: match_graph_event: Action copan_shelf_07_stop_0 (11) confirmed on lotus (rc=4) Mar 22 07:45:22 aston pengine: [10961]: WARN: unpack_rsc_op: Processing failed op copan_shelf_07_stop_0 on lotus: unknown error (1) Mar 22 07:45:22 aston pengine: [10961]: ERROR: unpack_rsc_op: Making sure copan_shelf_07 doesn't come up again Mar 22 07:45:22 aston pengine: [10961]: notice: native_print: copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus FAILED Mar 22 07:45:22 aston pengine: [10961]: info: get_failcount: copan_shelf_07 has failed INFINITY times on lotus Mar 22 07:45:22 aston pengine: [10961]: WARN: common_apply_stickiness: Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1) Mar 22 07:45:22 aston pengine: [10961]: info: rsc_merge_weights: cxfs-client-clone: Rolling back scores from copan_shelf_07 Mar 22 07:45:22 aston pengine: [10961]: info: native_color: Resource copan_shelf_07 cannot run anywhere Mar 22 07:45:22 aston pengine: [10961]: notice: LogActions: Stop resource copan_shelf_07 (lotus) Mar 22 07:45:23 aston crmd: [10962]: info: te_rsc_command: Initiating action 14: stop copan_shelf_07_stop_0 on lotus Mar 22 07:45:24 aston crmd: [10962]: info: match_graph_event: Action copan_shelf_07_stop_0 (14) confirmed on lotus (rc=0) Mar 22 08:00:25 aston pengine: [10961]: notice: native_print: copan_shelf_07 (ocf::sgi:copan_ov_client): Stopped Mar 22 08:00:25 aston pengine: [10961]: info: get_failcount: copan_shelf_07 has failed INFINITY times on lotus Mar 22 08:00:25 aston pengine: [10961]: WARN: common_apply_stickiness: Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1) Mar 22 08:00:25 aston pengine: [10961]: notice: RecurringOp: Start recurring monitor (30s) for copan_shelf_07 on aston Mar 22 08:00:25 aston pengine: [10961]: notice: LogActions: Start copan_shelf_07 (aston) Mar 22 08:00:25 aston crmd: [10962]: info: te_rsc_command: Initiating action 16: start copan_shelf_07_start_0 on aston (local) Mar 22 08:00:25 aston crmd: [10962]: info: do_lrm_rsc_op: Performing key=16:513:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d op=copan_shelf_07_start_0 ) Mar 22 08:00:25 aston lrmd: [10959]: debug: on_msg_perform_op:2359: copying parameters for rsc copan_shelf_07 Mar 22 08:00:25 aston lrmd: [10959]: debug: on_msg_perform_op: add an operation operation start[142] on ocf::copan_ov_client::copan_shelf_07 for client 10962, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.2] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] shelf_name=[c07] to the operation list.
Mar 22 08:00:25 aston lrmd: [10959]: info: rsc:copan_shelf_07:142: start


Thanks,

Phil

--
        Phil Armstrong       p...@sgi.com
        Phone: 651-683-5561  VNET 233-5561


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to