hi, I have a cluster with kvm vms and libvirt+iscsi, everything is fine except when i restart a node, when corosync service start, the cluster try move running vms resource in alive node before the resources, iscsi and libvirt, are ready
this rules colocation domainonservice inf: file serviceclone order servicebeforedomain inf: serviceclone:start file within resource-stickiness="1000" should prevent this, right? the only thing to do is a cleanup of resources. is a problem of timeout or wrong config? in attach my crm config and log Thanks to all in advance Umberto
node host2 \ attributes standby="off" node host3 \ attributes standby="off" primitive iscsi lsb:open-iscsi \ operations $id="iscsi-operation" \ op start interval="0" timeout="15s" \ op stop interval="0" timeout="15s" \ op monitor interval="30s" timeout="60s" primitive pingsan ocf:pacemaker:ping \ params name="pingd-san" host_list="192.168.1.3" multiplier="100" \ op monitor interval="10s" timeout="60s" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" primitive virsh lsb:libvirt-bin \ operations $id="virsh-operation" \ op start interval="0" timeout="15s" \ op stop interval="0" timeout="15s" \ op monitor interval="30s" timeout="60s" primitive file ocf:heartbeat:virtualdomain \ params config="/etc/libvirt/qemu/file.xml" hypervisor="qemu:///system" migration_transport="ssh" \ operations $id="file-operations" \ op start interval="0" timeout="180" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="90" \ op migrate_to interval="0" timeout="120" \ op monitor interval="10" timeout="30" start-delay="0" \ meta target-role="started" allow-migrate="true" failure-timeout="10min" group service iscsi virsh clone pingsanclone pingsan \ meta globally-unique="false" interleave="true" target-role="started" clone serviceclone service \ meta globally-unique="false" interleave="true" target-role="started" location serviceclonelocation serviceclone \ rule $id="servicecloneonconnectedsan" -inf: not_defined pingd-san or pingd-san lte 0 location cli-prefer-file file \ rule $id="cli-prefer-rule-file" inf: #uname eq host3 colocation domainonservice inf: _rsc_set_ ( file ) serviceclone order servicebeforedomain inf: _rsc_set_ serviceclone:start ( file ) property $id="cib-bootstrap-options" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ last-lrm-refresh="1314979997" rsc_defaults $id="rsc-options" \ resource-stickiness="1000"
Sep 2 18:13:28 host2 kernel: [33298.193160] vnet0: no IPv6 routers present Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 252: memb=1, new=0, lost=0 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: pcmk_peer_update: memb: host2 1711319232 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 252: memb=2, new=1, lost=0 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: update_member: Node 1728096448/host3 is now: member Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: pcmk_peer_update: NEW: host3 1728096448 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: pcmk_peer_update: MEMB: host2 1711319232 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: pcmk_peer_update: MEMB: host3 1728096448 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: send_member_notification: Sending membership update 252 to 2 children Sep 2 18:13:41 host2 corosync[1707]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Sep 2 18:13:41 host2 crmd: [1736]: notice: ais_dispatch_message: Membership 252: quorum acquired Sep 2 18:13:41 host2 crmd: [1736]: info: ais_status_callback: status: host3 is now member (was lost) Sep 2 18:13:41 host2 crmd: [1736]: info: crm_update_peer: Node host3: id=1728096448 state=member (new) addr=r(0) ip(192.168.0.103) votes=1 born=244 seen=252 proc=00000000000000000000000000000002 Sep 2 18:13:41 host2 cib: [1732]: notice: ais_dispatch_message: Membership 252: quorum acquired Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: update_member: 0xf24590 Node 1728096448 (host3) born on: 252 Sep 2 18:13:41 host2 cib: [1732]: info: crm_update_peer: Node host3: id=1728096448 state=member (new) addr=r(0) ip(192.168.0.103) votes=1 born=244 seen=252 proc=00000000000000000000000000000002 Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: update_member: Node host3 now has process list: 00000000000000000000000000111312 (1118994) Sep 2 18:13:41 host2 corosync[1707]: [pcmk ] info: send_member_notification: Sending membership update 252 to 2 children Sep 2 18:13:41 host2 cib: [1732]: info: ais_dispatch_message: Membership 252: quorum retained Sep 2 18:13:41 host2 cib: [1732]: info: crm_update_peer: Node host3: id=1728096448 state=member addr=r(0) ip(192.168.0.103) votes=1 born=252 seen=252 proc=00000000000000000000000000111312 (new) Sep 2 18:13:41 host2 crmd: [1736]: info: crm_update_quorum: Updating quorum status to true (call=960) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/lrm (origin=local/crmd/956, version=0.192.6): ok (rc=0) Sep 2 18:13:41 host2 corosync[1707]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.102) ; members(old:1 left:0) Sep 2 18:13:41 host2 corosync[1707]: [MAIN ] Completed service synchronization, ready to provide service. Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/transient_attributes (origin=local/crmd/957, version=0.192.7): ok (rc=0) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/958, version=0.192.8): ok (rc=0) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/960, version=0.192.10): ok (rc=0) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_sync_one for section 'all' (origin=host3/host3/(null), version=0.192.10): ok (rc=0) Sep 2 18:13:41 host2 crmd: [1736]: info: crmd_ais_dispatch: Setting expected votes to 3 Sep 2 18:13:41 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:276 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=PingSan:0_monitor_0, magic=0:7;10:154:7:4fd7da89-8f3a-4287-8e06-c274bb66aaa8, cib=0.192.6) : Resource op removal Sep 2 18:13:41 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host3']/lrm": ok (rc=0) Sep 2 18:13:41 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:163 - Triggered transition abort (complete=1, tag=transient_attributes, id=host3, magic=NA, cib=0.192.7) : Transient attribute: removal Sep 2 18:13:41 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host3']/transient_attributes": ok (rc=0) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/962, version=0.192.11): ok (rc=0) Sep 2 18:13:41 host2 crmd: [1736]: info: ais_dispatch_message: Membership 252: quorum retained Sep 2 18:13:41 host2 crmd: [1736]: notice: crmd_peer_update: Status update: Client host3/crmd now has status [online] (DC=true) Sep 2 18:13:41 host2 crmd: [1736]: info: crm_update_peer: Node host3: id=1728096448 state=member addr=r(0) ip(192.168.0.103) votes=1 born=252 seen=252 proc=00000000000000000000000000111312 (new) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/964, version=0.192.13): ok (rc=0) Sep 2 18:13:41 host2 crmd: [1736]: info: crmd_ais_dispatch: Setting expected votes to 3 Sep 2 18:13:41 host2 crmd: [1736]: info: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=crmd_peer_update ] Sep 2 18:13:41 host2 crmd: [1736]: info: abort_transition_graph: do_te_invoke:181 - Triggered transition abort (complete=1) : Peer Halt Sep 2 18:13:41 host2 crmd: [1736]: info: update_dc: Unset DC host2 Sep 2 18:13:41 host2 crmd: [1736]: info: join_make_offer: Making join offers based on membership 252 Sep 2 18:13:41 host2 crmd: [1736]: info: do_dc_join_offer_all: join-21: Waiting on 2 outstanding join acks Sep 2 18:13:41 host2 crmd: [1736]: info: update_dc: Set DC to host2 (3.0.5) Sep 2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/967, version=0.192.15): ok (rc=0) Sep 2 18:13:43 host2 crmd: [1736]: info: update_dc: Unset DC host2 Sep 2 18:13:43 host2 crmd: [1736]: info: do_dc_join_offer_all: A new node joined the cluster Sep 2 18:13:43 host2 crmd: [1736]: info: do_dc_join_offer_all: join-22: Waiting on 2 outstanding join acks Sep 2 18:13:43 host2 crmd: [1736]: info: update_dc: Set DC to host2 (3.0.5) Sep 2 18:13:44 host2 crmd: [1736]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Sep 2 18:13:44 host2 crmd: [1736]: info: do_state_transition: All 2 cluster nodes responded to the join offer. Sep 2 18:13:44 host2 crmd: [1736]: info: do_dc_join_finalize: join-22: Syncing the CIB from host2 to the rest of the cluster Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/970, version=0.192.15): ok (rc=0) Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/971, version=0.192.16): ok (rc=0) Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/972, version=0.192.17): ok (rc=0) Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/transient_attributes (origin=host3/crmd/6, version=0.192.18): ok (rc=0) Sep 2 18:13:44 host2 crmd: [1736]: info: do_dc_join_ack: join-22: Updating node state to member for host3 Sep 2 18:13:44 host2 crmd: [1736]: info: do_dc_join_ack: join-22: Updating node state to member for host2 Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/lrm (origin=local/crmd/973, version=0.192.19): ok (rc=0) Sep 2 18:13:44 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host3']/lrm": ok (rc=0) Sep 2 18:13:44 host2 crmd: [1736]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ] Sep 2 18:13:44 host2 crmd: [1736]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. Sep 2 18:13:44 host2 crmd: [1736]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date Sep 2 18:13:44 host2 crmd: [1736]: info: crm_update_quorum: Updating quorum status to true (call=979) Sep 2 18:13:44 host2 crmd: [1736]: info: abort_transition_graph: do_te_invoke:173 - Triggered transition abort (complete=1) : Peer Cancelled Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_local_callback: Sending full refresh (origin=crmd) Sep 2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke: Query 980: Requesting the current CIB: S_POLICY_ENGINE Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>) Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host2']/lrm (origin=local/crmd/975, version=0.192.21): ok (rc=0) Sep 2 18:13:44 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:276 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=mail_monitor_0, magic=0:7;8:128:7:4fd7da89-8f3a-4287-8e06-c274bb66aaa8, cib=0.192.21) : Resource op removal Sep 2 18:13:44 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host2']/lrm": ok (rc=0) Sep 2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke: Query 981: Requesting the current CIB: S_POLICY_ENGINE Sep 2 18:13:44 host2 crmd: [1736]: info: te_update_diff: Detected LRM refresh - 10 resources updated: Skipping all resource events Sep 2 18:13:44 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:236 - Triggered transition abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.192.22) : LRM Refresh Sep 2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke: Query 982: Requesting the current CIB: S_POLICY_ENGINE Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/977, version=0.192.23): ok (rc=0) Sep 2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/979, version=0.192.25): ok (rc=0) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mail (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-file (1314947908) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-win (<null>) Sep 2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke_callback: Invoking the PE: query=982, ref=pe_calc-dc-1314980024-941, seq=252, quorate=1 Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: pingd-san (100) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-file (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-win (1314947908) Sep 2 18:13:44 host2 pengine: [1735]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mon (<null>) Sep 2 18:13:44 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Virsh:1_monitor_0 found resource Virsh:1 active on host2 Sep 2 18:13:44 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Iscsi:1_monitor_0 found resource Iscsi:1 active on host2 Sep 2 18:13:44 host2 pengine: [1735]: notice: clone_print: Clone Set: ServiceClone [Service] Sep 2 18:13:44 host2 pengine: [1735]: notice: short_print: Started: [ host2 ] Sep 2 18:13:44 host2 pengine: [1735]: notice: short_print: Stopped: [ Service:1 Service:2 ] Sep 2 18:13:44 host2 pengine: [1735]: notice: clone_print: Clone Set: PingSanClone [PingSan] Sep 2 18:13:44 host2 pengine: [1735]: notice: short_print: Started: [ host2 ] Sep 2 18:13:44 host2 pengine: [1735]: notice: short_print: Stopped: [ PingSan:0 PingSan:2 ] Sep 2 18:13:44 host2 pengine: [1735]: notice: native_print: file#011(ocf::heartbeat:VirtualDomain):#011Started host2 Sep 2 18:13:44 host2 pengine: [1735]: notice: RecurringOp: Start recurring monitor (10s) for PingSan:0 on host3 Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave Iscsi:0#011(Started host2) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave Virsh:0#011(Started host2) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave Iscsi:1#011(Stopped) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave Virsh:1#011(Stopped) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave Iscsi:2#011(Stopped) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave Virsh:2#011(Stopped) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Start PingSan:0#011(host3) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave PingSan:1#011(Started host2) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave PingSan:2#011(Stopped) Sep 2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave file#011(Started host2) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mon (1314947908) Sep 2 18:13:44 host2 crmd: [1736]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Sep 2 18:13:44 host2 crmd: [1736]: info: unpack_graph: Unpacked transition 161: 10 actions in 10 synapses Sep 2 18:13:44 host2 crmd: [1736]: info: do_te_invoke: Processing graph 161 (ref=pe_calc-dc-1314980024-941) derived from /var/lib/pengine/pe-input-847.bz2 Sep 2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 9: monitor Iscsi:0_monitor_0 on host3 Sep 2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 10: monitor Virsh:0_monitor_0 on host3 Sep 2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 11: monitor PingSan:0_monitor_0 on host3 Sep 2 18:13:44 host2 crmd: [1736]: info: te_pseudo_action: Pseudo action 37 fired and confirmed Sep 2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 12: monitor file_monitor_0 on host3 Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Virsh:0 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-Virsh:0 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Virsh:1 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Virsh:2 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Iscsi:0 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Iscsi:1 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Iscsi:2 (<null>) Sep 2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mail (1314947908) Sep 2 18:13:45 host2 crmd: [1736]: info: match_graph_event: Action Virsh:0_monitor_0 (10) confirmed on host3 (rc=0) Sep 2 18:13:45 host2 crmd: [1736]: info: match_graph_event: Action PingSan:0_monitor_0 (11) confirmed on host3 (rc=0) Sep 2 18:13:45 host2 crmd: [1736]: info: match_graph_event: Action Iscsi:0_monitor_0 (9) confirmed on host3 (rc=0) Sep 2 18:14:05 host2 crmd: [1736]: WARN: status_from_rc: Action 12 (file_monitor_0) on host3 failed (target: 7 vs. rc: -2): Error Sep 2 18:14:05 host2 crmd: [1736]: info: abort_transition_graph: match_graph_event:265 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=file_monitor_0, magic=2:-2;12:161:7:4fd7da89-8f3a-4287-8e06-c274bb66aaa8, cib=0.192.41) : Event failed Sep 2 18:14:05 host2 crmd: [1736]: info: update_abort_priority: Abort priority upgraded from 0 to 1 Sep 2 18:14:05 host2 crmd: [1736]: info: update_abort_priority: Abort action done superceeded by restart Sep 2 18:14:05 host2 crmd: [1736]: info: match_graph_event: Action file_monitor_0 (12) confirmed on host3 (rc=4) Sep 2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 8: probe_complete probe_complete on host3 - no waiting Sep 2 18:14:05 host2 crmd: [1736]: info: run_graph: ==================================================== Sep 2 18:14:05 host2 crmd: [1736]: notice: run_graph: Transition 161 (Complete=6, Pending=0, Fired=0, Skipped=3, Incomplete=1, Source=/var/lib/pengine/pe-input-847.bz2): Stopped Sep 2 18:14:05 host2 crmd: [1736]: info: te_graph_trigger: Transition 161 is now complete Sep 2 18:14:05 host2 crmd: [1736]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 2 18:14:05 host2 crmd: [1736]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. Sep 2 18:14:05 host2 crmd: [1736]: info: do_pe_invoke: Query 983: Requesting the current CIB: S_POLICY_ENGINE Sep 2 18:14:05 host2 crmd: [1736]: info: do_pe_invoke_callback: Invoking the PE: query=983, ref=pe_calc-dc-1314980045-947, seq=252, quorate=1 Sep 2 18:14:05 host2 pengine: [1735]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 2 18:14:05 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Virsh:1_monitor_0 found resource Virsh:1 active on host2 Sep 2 18:14:05 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Iscsi:1_monitor_0 found resource Iscsi:1 active on host2 Sep 2 18:14:05 host2 pengine: [1735]: WARN: unpack_rsc_op: Processing failed op file_monitor_0 on host3: unknown exec error (-2) Sep 2 18:14:05 host2 pengine: [1735]: notice: clone_print: Clone Set: ServiceClone [Service] Sep 2 18:14:05 host2 pengine: [1735]: notice: short_print: Started: [ host2 ] Sep 2 18:14:05 host2 pengine: [1735]: notice: short_print: Stopped: [ Service:1 Service:2 ] Sep 2 18:14:05 host2 pengine: [1735]: notice: clone_print: Clone Set: PingSanClone [PingSan] Sep 2 18:14:05 host2 pengine: [1735]: notice: short_print: Started: [ host2 ] Sep 2 18:14:05 host2 pengine: [1735]: notice: short_print: Stopped: [ PingSan:0 PingSan:2 ] Sep 2 18:14:05 host2 pengine: [1735]: notice: native_print: file#011(ocf::heartbeat:VirtualDomain) Started FAILED Sep 2 18:14:05 host2 pengine: [1735]: notice: native_print: #0111 : host2 Sep 2 18:14:05 host2 pengine: [1735]: notice: native_print: #0112 : host3 Sep 2 18:14:05 host2 pengine: [1735]: notice: RecurringOp: Start recurring monitor (10s) for PingSan:0 on host3 Sep 2 18:14:05 host2 pengine: [1735]: ERROR: native_create_actions: Resource file (ocf::VirtualDomain) is active on 2 nodes attempting recovery Sep 2 18:14:05 host2 pengine: [1735]: WARN: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. Sep 2 18:14:05 host2 pengine: [1735]: notice: RecurringOp: Start recurring monitor (10s) for file on host2 Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave Iscsi:0#011(Started host2) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave Virsh:0#011(Started host2) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave Iscsi:1#011(Stopped) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave Virsh:1#011(Stopped) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave Iscsi:2#011(Stopped) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave Virsh:2#011(Stopped) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Start PingSan:0#011(host3) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave PingSan:1#011(Started host2) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave PingSan:2#011(Stopped) Sep 2 18:14:05 host2 pengine: [1735]: notice: LogActions: Recover file#011(Started host2) Sep 2 18:14:05 host2 crmd: [1736]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Sep 2 18:14:05 host2 crmd: [1736]: info: unpack_graph: Unpacked transition 162: 10 actions in 10 synapses Sep 2 18:14:05 host2 crmd: [1736]: info: do_te_invoke: Processing graph 162 (ref=pe_calc-dc-1314980045-947) derived from /var/lib/pengine/pe-error-111.bz2 Sep 2 18:14:05 host2 crmd: [1736]: info: te_pseudo_action: Pseudo action 34 fired and confirmed Sep 2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 38: stop file_stop_0 on host2 (local) Sep 2 18:14:05 host2 lrmd: [1733]: info: cancel_op: operation monitor[168] on file for client 1736, its parameters: CRM_meta_start_delay=[0] CRM_meta_timeout=[30000] CRM_meta_name=[monitor] crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/file.xml] CRM_meta_interval=[10000] hypervisor=[qemu:///system] migration_transport=[ssh] cancelled Sep 2 18:14:05 host2 crmd: [1736]: info: do_lrm_rsc_op: Performing key=38:162:0:4fd7da89-8f3a-4287-8e06-c274bb66aaa8 op=file_stop_0 ) Sep 2 18:14:05 host2 lrmd: [1733]: info: rsc:file stop[169] (pid 16205) Sep 2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 5: stop file_stop_0 on host3 Sep 2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 9: probe_complete probe_complete on host3 - no waiting Sep 2 18:14:05 host2 crmd: [1736]: info: process_lrm_event: LRM operation file_monitor_10000 (call=168, status=1, cib-update=0, confirmed=true) Cancelled Sep 2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 30: start PingSan:0_start_0 on host3 Sep 2 18:14:05 host2 VirtualDomain[16205]: INFO: Issuing graceful shutdown request for domain file. Sep 2 18:14:05 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:05 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:05 host2 pengine: [1735]: ERROR: process_pe_message: Transition 162: ERRORs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-error-111.bz2 Sep 2 18:14:06 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:07 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:08 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:09 host2 crmd: [1736]: info: match_graph_event: Action PingSan:0_start_0 (30) confirmed on host3 (rc=0) Sep 2 18:14:09 host2 crmd: [1736]: info: te_rsc_command: Initiating action 31: monitor PingSan:0_monitor_10000 on host3 Sep 2 18:14:09 host2 crmd: [1736]: info: te_pseudo_action: Pseudo action 35 fired and confirmed Sep 2 18:14:09 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:10 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown Sep 2 18:14:11 host2 avahi-daemon[1756]: Interface vnet0.IPv6 no longer relevant for mDNS. Sep 2 18:14:11 host2 avahi-daemon[1756]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fedc:bd82. Sep 2 18:14:11 host2 avahi-daemon[1756]: Withdrawing address record for fe80::fc54:ff:fedc:bd82 on vnet0. Sep 2 18:14:11 host2 kernel: [33341.324835] br0: port 2(vnet0) entering forwarding state Sep 2 18:14:11 host2 avahi-daemon[1756]: Withdrawing workstation service for vnet0. Sep 2 18:14:11 host2 kernel: [33341.327651] br0: port 2(vnet0) entering disabled state Sep 2 18:14:11 host2 kernel: [33341.327778] device vnet0 left promiscuous mode Sep 2 18:14:11 host2 kernel: [33341.327781] br0: port 2(vnet0) entering disabled state Sep 2 18:14:12 host2 lrmd: [1733]: info: RA output: (file:stop:stderr) error: Failed to shutdown domain file Sep 2 18:14:12 host2 lrmd: [1733]: info: RA output: (file:stop:stderr) error: operation failed: system shutdown operation failed Sep 2 18:14:12 host2 lrmd: [1733]: info: operation stop[169] on file for client 1736: pid 16205 exited with return code 0 Sep 2 18:14:12 host2 crmd: [1736]: info: process_lrm_event: LRM operation file_stop_0 (call=169, rc=0, cib-update=984, confirmed=true) ok Sep 2 18:14:12 host2 crmd: [1736]: info: match_graph_event: Action file_stop_0 (38) confirmed on host2 (rc=0) Sep 2 18:14:13 host2 crmd: [1736]: info: match_graph_event: Action PingSan:0_monitor_10000 (31) confirmed on host3 (rc=0) Sep 2 18:14:18 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:149 - Triggered transition abort (complete=0, tag=nvpair, id=status-host3-pingd-san, magic=NA, cib=0.192.50) : Transient attribute: update Sep 2 18:14:18 host2 crmd: [1736]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Sep 2 18:14:18 host2 crmd: [1736]: info: update_abort_priority: Abort action done superceeded by restart
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker