hi,

how to cleanup cib from node after unexpected system halt?
failed node still thinks of running VirtualDomain resource, which is 
already running on other node in cluster(sucessful takeover:

executing "pcs cluster start" - 
....
Apr  8 13:41:10 lnx0083a daemon:info lnx0083a 
VirtualDomain(lnx0106a)[20360]: INFO: Virtual domain lnx0106a currently 
has no state, retrying.
Apr  8 13:41:12 lnx0083a daemon:err|error lnx0083a 
VirtualDomain(lnx0106a)[20360]: ERROR: Virtual domain lnx0106a has no 
state during stop operation, bailing out.
Apr  8 13:41:12 lnx0083a daemon:info lnx0083a 
VirtualDomain(lnx0106a)[20360]: INFO: Issuing forced shutdown (destroy) 
request for domain lnx0106a.
Apr  8 13:41:12 lnx0083a daemon:err|error lnx0083a 
VirtualDomain(lnx0106a)[20360]: ERROR: forced stop failed
Apr  8 13:41:12 lnx0083a daemon:notice lnx0083a lrmd[14230]:   notice: 
operation_finished: lnx0106a_stop_0:20360:stderr [ error: failed to 
connect to the hypervisor error: end of file while reading data: : 
input/output error ]
Apr  8 13:41:12 lnx0083a daemon:notice lnx0083a lrmd[14230]:   notice: 
operation_finished: lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced 
stop failed ]
Apr  8 13:41:12 lnx0083a daemon:notice lnx0083a crmd[14233]:   notice: 
process_lrm_event: Operation lnx0106a_stop_0: unknown error 
(node=lnx0083a, call=131, rc=1, cib-update=43, confirmed=true)
Apr  8 13:41:12 lnx0083a daemon:notice lnx0083a crmd[14233]:   notice: 
process_lrm_event: lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect 
to the hypervisor error: end of file while reading data: : input/output 
error\nocf-exit-reason:forced stop failed\n ]
Apr  8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: 
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed 
(target: 0 vs. rc: 1): Error
Apr  8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: 
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a 
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr  8 13:41:12 lnx0083b daemon:notice lnx0083b crmd[18244]:   notice: 
abort_transition_graph: Transition aborted by lnx0106a_stop_0 'modify' on 
lnx0083a: Event failed 
(magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
cib=1.499.624, source=match_graph_event:350, 0)
Apr  8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: 
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a 
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr  8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: 
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed 
(target: 0 vs. rc: 1): Error
Apr  8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: 
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a 
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr  8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]: 
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a 
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr  8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: 
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on 
lnx0083a: unknown error (1)
Apr  8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: 
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on 
lnx0083a: unknown error (1)
Apr  8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: 
warning: pe_fence_node: Node lnx0083a will be fenced because of resource 
failure(s)
Apr  8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: 
warning: common_apply_stickiness: Forcing lnx0106a away from lnx0083a 
after 1000000 failures (max=3)
Apr  8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]: 
warning: stage6: Scheduling Node lnx0083a for STONITH
Apr  8 13:41:17 lnx0083b daemon:notice lnx0083b pengine[18243]:   notice: 
native_stop_constraints: Stop of failed resource lnx0106a is implicit 
after lnx0083a is fenced
....

Node is fenced..

log from corosync.log:
...
Apr 08 13:41:00 [14226] lnx0083a pacemakerd:   notice: mcp_read_config:   
Configured corosync to accept connections from group 2035: OK (1)
Apr 08 13:41:00 [14226] lnx0083a pacemakerd:   notice: main:    Starting 
Pacemaker 1.1.12 (Build: 4ed91da):  agent-manpages ascii-docs ncurses 
libqb-logging libqb-ip
c lha-fencing upstart nagios  corosync-native atomic-attrd libesmtp acls
....
Apr 08 13:16:04 [23690] lnx0083a        cib:     info: cib_perform_op:  + 
/cib/status/node_state[@id='4']/lrm[@id='4']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']:
 
 @operation_key=lnx0106a_stop_0, @operation=stop, 
@transition-key=106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@transition-magic=0:0;106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@call-id=538, @last-run=1428491757, @last-rc-change=1428491757, 
@exec-time=7686
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: write_attribute: 
Sent update 40 with 3 changes for fail-count-vm-lnx0106a, id=<n/a>, 
set=(null)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: write_attribute: 
Sent update 45 with 3 changes for fail-count-lnx0106a, id=<n/a>, 
set=(null)
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                    <lrm_resource id="lnx0106a" type="VirtualDomain" 
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                      <lrm_rsc_op id="lnx0106a_last_0" 
operation_key="lnx0106a_monitor_0" operation="monitor" 
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9" 
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0" 
last-run="1427965815" last-rc-change="1427965815" exec-time="8
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                    <lrm_resource id="lnx0106a" type="VirtualDomain" 
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                      <lrm_rsc_op id="lnx0106a_last_failure_0" 
operation_key="lnx0106a_migrate_to_0" operation="migrate_to" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" 
transition-key="112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:1;112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0129a" call-id="444" rc-code="1" op-status="0" interval="0" 
last-run="1427973596" last-rc-change="1427
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                      <lrm_rsc_op id="lnx0106a_last_0" 
operation_key="lnx0106a_stop_0" operation="stop" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" 
transition-key="113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:0;113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0129a" call-id="546" rc-code="0" op-status="0" interval="0" 
last-run="1428403880" last-rc-change="1428403880" exec-time="2
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                      <lrm_rsc_op id="lnx0106a_monitor_30000" 
operation_key="lnx0106a_monitor_30000" operation="monitor" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" 
transition-key="47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:0;47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0129a" call-id="436" rc-code="0" op-status="0" 
interval="30000" last-rc-change="1427965985" exec-time="1312
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                    <lrm_resource id="lnx0106a" type="VirtualDomain" 
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                      <lrm_rsc_op id="lnx0106a_last_0" 
operation_key="lnx0106a_start_0" operation="start" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" 
transition-key="110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:0;110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0129b" call-id="539" rc-code="0" op-status="0" interval="0" 
last-run="1428491780" last-rc-change="1428491780" exec-tim
Apr 08 13:41:04 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                      <lrm_rsc_op id="lnx0106a_monitor_30000" 
operation_key="lnx0106a_monitor_30000" operation="monitor" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" 
transition-key="89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:0;89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0129b" call-id="540" rc-code="0" op-status="0" 
interval="30000" last-rc-change="1428491810" exec-time="12
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 40 for fail-count-vm-lnx0106a: OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 40 for fail-count-vm-lnx0106a[lnx0129a]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 40 for fail-count-vm-lnx0106a[lnx0129b]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 40 for fail-count-vm-lnx0106a[lnx0083b]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 45 for fail-count-lnx0106a: OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 45 for fail-count-lnx0106a[lnx0129a]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 45 for fail-count-lnx0106a[lnx0129b]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a      attrd:     info: attrd_cib_callback: 
        Update 45 for fail-count-lnx0106a[lnx0083b]=(null): OK (0)
Apr 08 13:41:05 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                                      <lrm_resource id="lnx0106a" 
type="VirtualDomain" class="ocf" provider="heartbeat">
Apr 08 13:41:05 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                                        <lrm_rsc_op id="lnx0106a_last_0" 
operation_key="lnx0106a_monitor_0" operation="monitor" 
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9" 
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0" 
last-run="1427965815" last-rc-change="142796
Apr 08 13:41:07 [14230] lnx0083a       lrmd:     info: 
process_lrmd_get_rsc_info:      Resource 'lnx0106a' not found (27 active 
resources)
Apr 08 13:41:07 [14230] lnx0083a       lrmd:     info: 
process_lrmd_rsc_register:      Added 'lnx0106a' to the rsc list (28 
active resources)
Apr 08 13:41:07 [14233] lnx0083a       crmd:     info: do_lrm_rsc_op: 
Performing key=65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc 
op=lnx0106a_monitor_0
Apr 08 13:41:08 [14233] lnx0083a       crmd:   notice: process_lrm_event: 
Operation lnx0106a_monitor_0: not running (node=lnx0083a, call=114, rc=7, 
cib-update=34, confirmed=true)
Apr 08 13:41:08 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources:  <lrm_resource 
id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"/>
Apr 08 13:41:08 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                                                               <lrm_rsc_op 
id="lnx0106a_last_failure_0" operation_key="lnx0106a_monitor_0" 
operation="monitor" crm-debug-origin="do_update_resource" 
crm_feature_set="3.0.9" 
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0" 
last-ru
Apr 08 13:41:08 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
                                                               <lrm_rsc_op 
id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0" 
operation="monitor" crm-debug-origin="do_update_resource" 
crm_feature_set="3.0.9" 
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc" 
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0" 
last-run="14284
Apr 08 13:41:09 [14233] lnx0083a       crmd:     info: do_lrm_rsc_op: 
Performing key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc 
op=lnx0106a_stop_0
Apr 08 13:41:09 [14230] lnx0083a       lrmd:     info: log_execute: 
executing - rsc:lnx0106a action:stop call_id:131
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:09 INFO: Virtual domain 
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:10 INFO: Virtual domain 
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: Virtual domain 
lnx0106a has no state during stop operation, bailing out.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 INFO: Issuing forced 
shutdown (destroy) request for domain lnx0106a.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: forced stop 
failed
Apr 08 13:41:12 [14230] lnx0083a       lrmd:   notice: operation_finished: 
        lnx0106a_stop_0:20360:stderr [ error: failed to connect to the 
hypervisor error: end of file while reading data: : input/output error ]
Apr 08 13:41:12 [14230] lnx0083a       lrmd:   notice: operation_finished: 
        lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced stop failed 
]
Apr 08 13:41:12 [14230] lnx0083a       lrmd:     info: log_finished: 
finished - rsc:lnx0106a action:stop call_id:131 pid:20360 exit-code:1 
exec-time:2609ms queue-time:0ms
Apr 08 13:41:12 [14233] lnx0083a       crmd:   notice: process_lrm_event: 
Operation lnx0106a_stop_0: unknown error (node=lnx0083a, call=131, rc=1, 
cib-update=43, confirmed=true)
Apr 08 13:41:12 [14233] lnx0083a       crmd:   notice: process_lrm_event: 
lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect to the hypervisor 
error: end of file while reading data: : input/output 
error\nocf-exit-reason:forced stop failed\n ]
Apr 08 13:41:12 [14228] lnx0083a        cib:     info: cib_perform_op:  + 
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_failure_0']:
 
 @operation_key=lnx0106a_stop_0, @operation=stop, 
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@call-id=131, @rc-code=1, @last-run=1428493269, 
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
Apr 08 13:41:12 [14228] lnx0083a        cib:     info: cib_perform_op:  + 
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']:
 
 @operation_key=lnx0106a_stop_0, @operation=stop, 
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@call-id=131, @rc-code=1, @last-run=1428493269, 
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop 
failed
Apr 08 13:41:12 [14231] lnx0083a      attrd:     info: attrd_peer_update: 
Setting fail-count-lnx0106a[lnx0083a]: (null) -> INFINITY from lnx0083b
Apr 08 13:41:12 [14231] lnx0083a      attrd:     info: attrd_peer_update: 
Setting last-failure-lnx0106a[lnx0083a]: (null) -> 1428493272 from 
lnx0083b
Apr 08 13:41:12 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
 
 <nvpair id="status-1-fail-count-lnx0106a" name="fail-count-lnx0106a" 
value="INFINITY"/>
Apr 08 13:41:12 [14228] lnx0083a        cib:     info: cib_perform_op:  ++ 
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
 
 <nvpair id="status-1-last-failure-lnx0106a" name="last-failure-lnx0106a" 
value="1428493272"/>
Apr 08 13:41:17 [14228] lnx0083a        cib:     info: cib_perform_op:  + 
/cib/status/node_state[@id='4']/lrm[@id='4']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']:
 
 @operation_key=lnx0106a_stop_0, @operation=stop, 
@transition-key=106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@transition-magic=0:0;106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc, 
@call-id=542, @last-run=1428493269, @last-rc-change=1428493269, 
@exec-time=7645
...

any ideas?

thank you!
Philipp
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to