hi, how to cleanup cib from node after unexpected system halt? failed node still thinks of running VirtualDomain resource, which is already running on other node in cluster(sucessful takeover:
executing "pcs cluster start" - .... Apr 8 13:41:10 lnx0083a daemon:info lnx0083a VirtualDomain(lnx0106a)[20360]: INFO: Virtual domain lnx0106a currently has no state, retrying. Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a VirtualDomain(lnx0106a)[20360]: ERROR: Virtual domain lnx0106a has no state during stop operation, bailing out. Apr 8 13:41:12 lnx0083a daemon:info lnx0083a VirtualDomain(lnx0106a)[20360]: INFO: Issuing forced shutdown (destroy) request for domain lnx0106a. Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a VirtualDomain(lnx0106a)[20360]: ERROR: forced stop failed Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a lrmd[14230]: notice: operation_finished: lnx0106a_stop_0:20360:stderr [ error: failed to connect to the hypervisor error: end of file while reading data: : input/output error ] Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a lrmd[14230]: notice: operation_finished: lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced stop failed ] Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a crmd[14233]: notice: process_lrm_event: Operation lnx0106a_stop_0: unknown error (node=lnx0083a, call=131, rc=1, cib-update=43, confirmed=true) Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a crmd[14233]: notice: process_lrm_event: lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect to the hypervisor error: end of file while reading data: : input/output error\nocf-exit-reason:forced stop failed\n ] Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed (target: 0 vs. rc: 1): Error Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: warning: update_failcount: Updating failcount for lnx0106a on lnx0083a after failed stop: rc=1 (update=INFINITY, time=1428493272) Apr 8 13:41:12 lnx0083b daemon:notice lnx0083b crmd[18244]: notice: abort_transition_graph: Transition aborted by lnx0106a_stop_0 'modify' on lnx0083a: Event failed (magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, cib=1.499.624, source=match_graph_event:350, 0) Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: warning: update_failcount: Updating failcount for lnx0106a on lnx0083a after failed stop: rc=1 (update=INFINITY, time=1428493272) Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed (target: 0 vs. rc: 1): Error Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: warning: update_failcount: Updating failcount for lnx0106a on lnx0083a after failed stop: rc=1 (update=INFINITY, time=1428493272) Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: warning: update_failcount: Updating failcount for lnx0106a on lnx0083a after failed stop: rc=1 (update=INFINITY, time=1428493272) Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on lnx0083a: unknown error (1) Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on lnx0083a: unknown error (1) Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: warning: pe_fence_node: Node lnx0083a will be fenced because of resource failure(s) Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: warning: common_apply_stickiness: Forcing lnx0106a away from lnx0083a after 1000000 failures (max=3) Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: warning: stage6: Scheduling Node lnx0083a for STONITH Apr 8 13:41:17 lnx0083b daemon:notice lnx0083b pengine[18243]: notice: native_stop_constraints: Stop of failed resource lnx0106a is implicit after lnx0083a is fenced .... Node is fenced.. log from corosync.log: ... Apr 08 13:41:00 [14226] lnx0083a pacemakerd: notice: mcp_read_config: Configured corosync to accept connections from group 2035: OK (1) Apr 08 13:41:00 [14226] lnx0083a pacemakerd: notice: main: Starting Pacemaker 1.1.12 (Build: 4ed91da): agent-manpages ascii-docs ncurses libqb-logging libqb-ip c lha-fencing upstart nagios corosync-native atomic-attrd libesmtp acls .... Apr 08 13:16:04 [23690] lnx0083a cib: info: cib_perform_op: + /cib/status/node_state[@id='4']/lrm[@id='4']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']: @operation_key=lnx0106a_stop_0, @operation=stop, @transition-key=106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @transition-magic=0:0;106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @call-id=538, @last-run=1428491757, @last-rc-change=1428491757, @exec-time=7686 Apr 08 13:41:04 [14231] lnx0083a attrd: info: write_attribute: Sent update 40 with 3 changes for fail-count-vm-lnx0106a, id=<n/a>, set=(null) Apr 08 13:41:04 [14231] lnx0083a attrd: info: write_attribute: Sent update 45 with 3 changes for fail-count-lnx0106a, id=<n/a>, set=(null) Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_resource id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"> Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9" transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0" last-run="1427965815" last-rc-change="1427965815" exec-time="8 Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_resource id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"> Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_failure_0" operation_key="lnx0106a_migrate_to_0" operation="migrate_to" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:1;112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0129a" call-id="444" rc-code="1" op-status="0" interval="0" last-run="1427973596" last-rc-change="1427 Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_0" operation_key="lnx0106a_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:0;113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0129a" call-id="546" rc-code="0" op-status="0" interval="0" last-run="1428403880" last-rc-change="1428403880" exec-time="2 Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_monitor_30000" operation_key="lnx0106a_monitor_30000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:0;47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0129a" call-id="436" rc-code="0" op-status="0" interval="30000" last-rc-change="1427965985" exec-time="1312 Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_resource id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"> Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_0" operation_key="lnx0106a_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:0;110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0129b" call-id="539" rc-code="0" op-status="0" interval="0" last-run="1428491780" last-rc-change="1428491780" exec-tim Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_monitor_30000" operation_key="lnx0106a_monitor_30000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:0;89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0129b" call-id="540" rc-code="0" op-status="0" interval="30000" last-rc-change="1428491810" exec-time="12 Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 40 for fail-count-vm-lnx0106a: OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 40 for fail-count-vm-lnx0106a[lnx0129a]=(null): OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 40 for fail-count-vm-lnx0106a[lnx0129b]=(null): OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 40 for fail-count-vm-lnx0106a[lnx0083b]=(null): OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 45 for fail-count-lnx0106a: OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 45 for fail-count-lnx0106a[lnx0129a]=(null): OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 45 for fail-count-lnx0106a[lnx0129b]=(null): OK (0) Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback: Update 45 for fail-count-lnx0106a[lnx0083b]=(null): OK (0) Apr 08 13:41:05 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_resource id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"> Apr 08 13:41:05 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9" transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0" last-run="1427965815" last-rc-change="142796 Apr 08 13:41:07 [14230] lnx0083a lrmd: info: process_lrmd_get_rsc_info: Resource 'lnx0106a' not found (27 active resources) Apr 08 13:41:07 [14230] lnx0083a lrmd: info: process_lrmd_rsc_register: Added 'lnx0106a' to the rsc list (28 active resources) Apr 08 13:41:07 [14233] lnx0083a crmd: info: do_lrm_rsc_op: Performing key=65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc op=lnx0106a_monitor_0 Apr 08 13:41:08 [14233] lnx0083a crmd: notice: process_lrm_event: Operation lnx0106a_monitor_0: not running (node=lnx0083a, call=114, rc=7, cib-update=34, confirmed=true) Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources: <lrm_resource id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"/> Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_failure_0" operation_key="lnx0106a_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0" last-ru Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++ <lrm_rsc_op id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0" last-run="14284 Apr 08 13:41:09 [14233] lnx0083a crmd: info: do_lrm_rsc_op: Performing key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc op=lnx0106a_stop_0 Apr 08 13:41:09 [14230] lnx0083a lrmd: info: log_execute: executing - rsc:lnx0106a action:stop call_id:131 VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:09 INFO: Virtual domain lnx0106a currently has no state, retrying. VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:10 INFO: Virtual domain lnx0106a currently has no state, retrying. VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: Virtual domain lnx0106a has no state during stop operation, bailing out. VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 INFO: Issuing forced shutdown (destroy) request for domain lnx0106a. VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: forced stop failed Apr 08 13:41:12 [14230] lnx0083a lrmd: notice: operation_finished: lnx0106a_stop_0:20360:stderr [ error: failed to connect to the hypervisor error: end of file while reading data: : input/output error ] Apr 08 13:41:12 [14230] lnx0083a lrmd: notice: operation_finished: lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced stop failed ] Apr 08 13:41:12 [14230] lnx0083a lrmd: info: log_finished: finished - rsc:lnx0106a action:stop call_id:131 pid:20360 exit-code:1 exec-time:2609ms queue-time:0ms Apr 08 13:41:12 [14233] lnx0083a crmd: notice: process_lrm_event: Operation lnx0106a_stop_0: unknown error (node=lnx0083a, call=131, rc=1, cib-update=43, confirmed=true) Apr 08 13:41:12 [14233] lnx0083a crmd: notice: process_lrm_event: lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect to the hypervisor error: end of file while reading data: : input/output error\nocf-exit-reason:forced stop failed\n ] Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_failure_0']: @operation_key=lnx0106a_stop_0, @operation=stop, @transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @call-id=131, @rc-code=1, @last-run=1428493269, @last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']: @operation_key=lnx0106a_stop_0, @operation=stop, @transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @call-id=131, @rc-code=1, @last-run=1428493269, @last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop failed Apr 08 13:41:12 [14231] lnx0083a attrd: info: attrd_peer_update: Setting fail-count-lnx0106a[lnx0083a]: (null) -> INFINITY from lnx0083b Apr 08 13:41:12 [14231] lnx0083a attrd: info: attrd_peer_update: Setting last-failure-lnx0106a[lnx0083a]: (null) -> 1428493272 from lnx0083b Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']: <nvpair id="status-1-fail-count-lnx0106a" name="fail-count-lnx0106a" value="INFINITY"/> Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']: <nvpair id="status-1-last-failure-lnx0106a" name="last-failure-lnx0106a" value="1428493272"/> Apr 08 13:41:17 [14228] lnx0083a cib: info: cib_perform_op: + /cib/status/node_state[@id='4']/lrm[@id='4']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']: @operation_key=lnx0106a_stop_0, @operation=stop, @transition-key=106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @transition-magic=0:0;106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, @call-id=542, @last-run=1428493269, @last-rc-change=1428493269, @exec-time=7645 ... any ideas? thank you! Philipp
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org