[Pacemaker] resource go in unmanaged mode when node startup

Umberto Carrara Fri, 02 Sep 2011 10:29:42 -0700

hi,
I have a cluster with kvm vms and libvirt+iscsi, everything is fine except 
when i restart a node, when corosync service start, the cluster try move 
running vms resource in alive node before the resources, iscsi and libvirt, 
are ready


this rules

colocation domainonservice inf: file serviceclone
order servicebeforedomain inf: serviceclone:start  file

within

resource-stickiness="1000"

should prevent this, right?


the only thing to do is a cleanup of resources.

is a problem of timeout or wrong config?

in attach my crm config and log

Thanks to all in advance

Umberto

node host2 \
        attributes standby="off"
node host3 \
        attributes standby="off"
primitive iscsi lsb:open-iscsi \
        operations $id="iscsi-operation" \
        op start interval="0" timeout="15s" \
        op stop interval="0" timeout="15s" \
        op monitor interval="30s" timeout="60s"
primitive pingsan ocf:pacemaker:ping \
        params name="pingd-san" host_list="192.168.1.3" multiplier="100" \
        op monitor interval="10s" timeout="60s" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive virsh lsb:libvirt-bin \
        operations $id="virsh-operation" \
        op start interval="0" timeout="15s" \
        op stop interval="0" timeout="15s" \
        op monitor interval="30s" timeout="60s"
primitive file ocf:heartbeat:virtualdomain \
        params config="/etc/libvirt/qemu/file.xml" hypervisor="qemu:///system" 
migration_transport="ssh" \
        operations $id="file-operations" \
        op start interval="0" timeout="180" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="90" \
        op migrate_to interval="0" timeout="120" \
        op monitor interval="10" timeout="30" start-delay="0" \
        meta target-role="started" allow-migrate="true" failure-timeout="10min"
group service iscsi virsh
clone pingsanclone pingsan \
        meta globally-unique="false" interleave="true" target-role="started"
clone serviceclone service \
        meta globally-unique="false" interleave="true" target-role="started"
location serviceclonelocation serviceclone \
        rule $id="servicecloneonconnectedsan" -inf: not_defined pingd-san or 
pingd-san lte 0
location cli-prefer-file file \
        rule $id="cli-prefer-rule-file" inf: #uname eq host3
colocation domainonservice inf: _rsc_set_ ( file ) serviceclone
order servicebeforedomain inf: _rsc_set_ serviceclone:start ( file )
property $id="cib-bootstrap-options" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        last-lrm-refresh="1314979997"
rsc_defaults $id="rsc-options" \
        resource-stickiness="1000"

Sep  2 18:13:28 host2 kernel: [33298.193160] vnet0: no IPv6 routers present
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 252: memb=1, new=0, lost=0
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: pcmk_peer_update: memb: host2 1711319232
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 252: memb=2, new=1, lost=0
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: update_member: Node 1728096448/host3 is now: member
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: pcmk_peer_update: NEW:  host3 1728096448
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: pcmk_peer_update: MEMB: host2 1711319232
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: pcmk_peer_update: MEMB: host3 1728096448
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: send_member_notification: Sending membership update 252 to 2 children
Sep  2 18:13:41 host2 corosync[1707]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep  2 18:13:41 host2 crmd: [1736]: notice: ais_dispatch_message: Membership 252: quorum acquired
Sep  2 18:13:41 host2 crmd: [1736]: info: ais_status_callback: status: host3 is now member (was lost)
Sep  2 18:13:41 host2 crmd: [1736]: info: crm_update_peer: Node host3: id=1728096448 state=member (new) addr=r(0) ip(192.168.0.103)  votes=1 born=244 seen=252 proc=00000000000000000000000000000002
Sep  2 18:13:41 host2 cib: [1732]: notice: ais_dispatch_message: Membership 252: quorum acquired
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: update_member: 0xf24590 Node 1728096448 (host3) born on: 252
Sep  2 18:13:41 host2 cib: [1732]: info: crm_update_peer: Node host3: id=1728096448 state=member (new) addr=r(0) ip(192.168.0.103)  votes=1 born=244 seen=252 proc=00000000000000000000000000000002
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: update_member: Node host3 now has process list: 00000000000000000000000000111312 (1118994)
Sep  2 18:13:41 host2 corosync[1707]:   [pcmk  ] info: send_member_notification: Sending membership update 252 to 2 children
Sep  2 18:13:41 host2 cib: [1732]: info: ais_dispatch_message: Membership 252: quorum retained
Sep  2 18:13:41 host2 cib: [1732]: info: crm_update_peer: Node host3: id=1728096448 state=member addr=r(0) ip(192.168.0.103)  votes=1 born=252 seen=252 proc=00000000000000000000000000111312 (new)
Sep  2 18:13:41 host2 crmd: [1736]: info: crm_update_quorum: Updating quorum status to true (call=960)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/lrm (origin=local/crmd/956, version=0.192.6): ok (rc=0)
Sep  2 18:13:41 host2 corosync[1707]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.0.102) ; members(old:1 left:0)
Sep  2 18:13:41 host2 corosync[1707]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/transient_attributes (origin=local/crmd/957, version=0.192.7): ok (rc=0)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/958, version=0.192.8): ok (rc=0)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/960, version=0.192.10): ok (rc=0)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_sync_one for section 'all' (origin=host3/host3/(null), version=0.192.10): ok (rc=0)
Sep  2 18:13:41 host2 crmd: [1736]: info: crmd_ais_dispatch: Setting expected votes to 3
Sep  2 18:13:41 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:276 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=PingSan:0_monitor_0, magic=0:7;10:154:7:4fd7da89-8f3a-4287-8e06-c274bb66aaa8, cib=0.192.6) : Resource op removal
Sep  2 18:13:41 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host3']/lrm": ok (rc=0)
Sep  2 18:13:41 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:163 - Triggered transition abort (complete=1, tag=transient_attributes, id=host3, magic=NA, cib=0.192.7) : Transient attribute: removal
Sep  2 18:13:41 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host3']/transient_attributes": ok (rc=0)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/962, version=0.192.11): ok (rc=0)
Sep  2 18:13:41 host2 crmd: [1736]: info: ais_dispatch_message: Membership 252: quorum retained
Sep  2 18:13:41 host2 crmd: [1736]: notice: crmd_peer_update: Status update: Client host3/crmd now has status [online] (DC=true)
Sep  2 18:13:41 host2 crmd: [1736]: info: crm_update_peer: Node host3: id=1728096448 state=member addr=r(0) ip(192.168.0.103)  votes=1 born=252 seen=252 proc=00000000000000000000000000111312 (new)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/964, version=0.192.13): ok (rc=0)
Sep  2 18:13:41 host2 crmd: [1736]: info: crmd_ais_dispatch: Setting expected votes to 3
Sep  2 18:13:41 host2 crmd: [1736]: info: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=crmd_peer_update ]
Sep  2 18:13:41 host2 crmd: [1736]: info: abort_transition_graph: do_te_invoke:181 - Triggered transition abort (complete=1) : Peer Halt
Sep  2 18:13:41 host2 crmd: [1736]: info: update_dc: Unset DC host2
Sep  2 18:13:41 host2 crmd: [1736]: info: join_make_offer: Making join offers based on membership 252
Sep  2 18:13:41 host2 crmd: [1736]: info: do_dc_join_offer_all: join-21: Waiting on 2 outstanding join acks
Sep  2 18:13:41 host2 crmd: [1736]: info: update_dc: Set DC to host2 (3.0.5)
Sep  2 18:13:41 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/967, version=0.192.15): ok (rc=0)
Sep  2 18:13:43 host2 crmd: [1736]: info: update_dc: Unset DC host2
Sep  2 18:13:43 host2 crmd: [1736]: info: do_dc_join_offer_all: A new node joined the cluster
Sep  2 18:13:43 host2 crmd: [1736]: info: do_dc_join_offer_all: join-22: Waiting on 2 outstanding join acks
Sep  2 18:13:43 host2 crmd: [1736]: info: update_dc: Set DC to host2 (3.0.5)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Sep  2 18:13:44 host2 crmd: [1736]: info: do_state_transition: All 2 cluster nodes responded to the join offer.
Sep  2 18:13:44 host2 crmd: [1736]: info: do_dc_join_finalize: join-22: Syncing the CIB from host2 to the rest of the cluster
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/970, version=0.192.15): ok (rc=0)
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/971, version=0.192.16): ok (rc=0)
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/972, version=0.192.17): ok (rc=0)
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/transient_attributes (origin=host3/crmd/6, version=0.192.18): ok (rc=0)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_dc_join_ack: join-22: Updating node state to member for host3
Sep  2 18:13:44 host2 crmd: [1736]: info: do_dc_join_ack: join-22: Updating node state to member for host2
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host3']/lrm (origin=local/crmd/973, version=0.192.19): ok (rc=0)
Sep  2 18:13:44 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host3']/lrm": ok (rc=0)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Sep  2 18:13:44 host2 crmd: [1736]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
Sep  2 18:13:44 host2 crmd: [1736]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date
Sep  2 18:13:44 host2 crmd: [1736]: info: crm_update_quorum: Updating quorum status to true (call=979)
Sep  2 18:13:44 host2 crmd: [1736]: info: abort_transition_graph: do_te_invoke:173 - Triggered transition abort (complete=1) : Peer Cancelled
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_local_callback: Sending full refresh (origin=crmd)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke: Query 980: Requesting the current CIB: S_POLICY_ENGINE
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>)
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='host2']/lrm (origin=local/crmd/975, version=0.192.21): ok (rc=0)
Sep  2 18:13:44 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:276 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=mail_monitor_0, magic=0:7;8:128:7:4fd7da89-8f3a-4287-8e06-c274bb66aaa8, cib=0.192.21) : Resource op removal
Sep  2 18:13:44 host2 crmd: [1736]: info: erase_xpath_callback: Deletion of "//node_state[@uname='host2']/lrm": ok (rc=0)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke: Query 981: Requesting the current CIB: S_POLICY_ENGINE
Sep  2 18:13:44 host2 crmd: [1736]: info: te_update_diff: Detected LRM refresh - 10 resources updated: Skipping all resource events
Sep  2 18:13:44 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:236 - Triggered transition abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.192.22) : LRM Refresh
Sep  2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke: Query 982: Requesting the current CIB: S_POLICY_ENGINE
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/977, version=0.192.23): ok (rc=0)
Sep  2 18:13:44 host2 cib: [1732]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/979, version=0.192.25): ok (rc=0)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mail (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-file (1314947908)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-win (<null>)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_pe_invoke_callback: Invoking the PE: query=982, ref=pe_calc-dc-1314980024-941, seq=252, quorate=1
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: pingd-san (100)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-file (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-win (1314947908)
Sep  2 18:13:44 host2 pengine: [1735]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mon (<null>)
Sep  2 18:13:44 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Virsh:1_monitor_0 found resource Virsh:1 active on host2
Sep  2 18:13:44 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Iscsi:1_monitor_0 found resource Iscsi:1 active on host2
Sep  2 18:13:44 host2 pengine: [1735]: notice: clone_print:  Clone Set: ServiceClone [Service]
Sep  2 18:13:44 host2 pengine: [1735]: notice: short_print:      Started: [ host2 ]
Sep  2 18:13:44 host2 pengine: [1735]: notice: short_print:      Stopped: [ Service:1 Service:2 ]
Sep  2 18:13:44 host2 pengine: [1735]: notice: clone_print:  Clone Set: PingSanClone [PingSan]
Sep  2 18:13:44 host2 pengine: [1735]: notice: short_print:      Started: [ host2 ]
Sep  2 18:13:44 host2 pengine: [1735]: notice: short_print:      Stopped: [ PingSan:0 PingSan:2 ]
Sep  2 18:13:44 host2 pengine: [1735]: notice: native_print: file#011(ocf::heartbeat:VirtualDomain):#011Started host2
Sep  2 18:13:44 host2 pengine: [1735]: notice: RecurringOp:  Start recurring monitor (10s) for PingSan:0 on host3
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   Iscsi:0#011(Started host2)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   Virsh:0#011(Started host2)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   Iscsi:1#011(Stopped)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   Virsh:1#011(Stopped)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   Iscsi:2#011(Stopped)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   Virsh:2#011(Stopped)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Start   PingSan:0#011(host3)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   PingSan:1#011(Started host2)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   PingSan:2#011(Stopped)
Sep  2 18:13:44 host2 pengine: [1735]: notice: LogActions: Leave   file#011(Started host2)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mon (1314947908)
Sep  2 18:13:44 host2 crmd: [1736]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Sep  2 18:13:44 host2 crmd: [1736]: info: unpack_graph: Unpacked transition 161: 10 actions in 10 synapses
Sep  2 18:13:44 host2 crmd: [1736]: info: do_te_invoke: Processing graph 161 (ref=pe_calc-dc-1314980024-941) derived from /var/lib/pengine/pe-input-847.bz2
Sep  2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 9: monitor Iscsi:0_monitor_0 on host3
Sep  2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 10: monitor Virsh:0_monitor_0 on host3
Sep  2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 11: monitor PingSan:0_monitor_0 on host3
Sep  2 18:13:44 host2 crmd: [1736]: info: te_pseudo_action: Pseudo action 37 fired and confirmed
Sep  2 18:13:44 host2 crmd: [1736]: info: te_rsc_command: Initiating action 12: monitor file_monitor_0 on host3
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Virsh:0 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-Virsh:0 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Virsh:1 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Virsh:2 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Iscsi:0 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Iscsi:1 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Iscsi:2 (<null>)
Sep  2 18:13:44 host2 attrd: [1734]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mail (1314947908)
Sep  2 18:13:45 host2 crmd: [1736]: info: match_graph_event: Action Virsh:0_monitor_0 (10) confirmed on host3 (rc=0)
Sep  2 18:13:45 host2 crmd: [1736]: info: match_graph_event: Action PingSan:0_monitor_0 (11) confirmed on host3 (rc=0)
Sep  2 18:13:45 host2 crmd: [1736]: info: match_graph_event: Action Iscsi:0_monitor_0 (9) confirmed on host3 (rc=0)
Sep  2 18:14:05 host2 crmd: [1736]: WARN: status_from_rc: Action 12 (file_monitor_0) on host3 failed (target: 7 vs. rc: -2): Error
Sep  2 18:14:05 host2 crmd: [1736]: info: abort_transition_graph: match_graph_event:265 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=file_monitor_0, magic=2:-2;12:161:7:4fd7da89-8f3a-4287-8e06-c274bb66aaa8, cib=0.192.41) : Event failed
Sep  2 18:14:05 host2 crmd: [1736]: info: update_abort_priority: Abort priority upgraded from 0 to 1
Sep  2 18:14:05 host2 crmd: [1736]: info: update_abort_priority: Abort action done superceeded by restart
Sep  2 18:14:05 host2 crmd: [1736]: info: match_graph_event: Action file_monitor_0 (12) confirmed on host3 (rc=4)
Sep  2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 8: probe_complete probe_complete on host3 - no waiting
Sep  2 18:14:05 host2 crmd: [1736]: info: run_graph: ====================================================
Sep  2 18:14:05 host2 crmd: [1736]: notice: run_graph: Transition 161 (Complete=6, Pending=0, Fired=0, Skipped=3, Incomplete=1, Source=/var/lib/pengine/pe-input-847.bz2): Stopped
Sep  2 18:14:05 host2 crmd: [1736]: info: te_graph_trigger: Transition 161 is now complete
Sep  2 18:14:05 host2 crmd: [1736]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Sep  2 18:14:05 host2 crmd: [1736]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
Sep  2 18:14:05 host2 crmd: [1736]: info: do_pe_invoke: Query 983: Requesting the current CIB: S_POLICY_ENGINE
Sep  2 18:14:05 host2 crmd: [1736]: info: do_pe_invoke_callback: Invoking the PE: query=983, ref=pe_calc-dc-1314980045-947, seq=252, quorate=1
Sep  2 18:14:05 host2 pengine: [1735]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep  2 18:14:05 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Virsh:1_monitor_0 found resource Virsh:1 active on host2
Sep  2 18:14:05 host2 pengine: [1735]: notice: unpack_rsc_op: Operation Iscsi:1_monitor_0 found resource Iscsi:1 active on host2
Sep  2 18:14:05 host2 pengine: [1735]: WARN: unpack_rsc_op: Processing failed op file_monitor_0 on host3: unknown exec error (-2)
Sep  2 18:14:05 host2 pengine: [1735]: notice: clone_print:  Clone Set: ServiceClone [Service]
Sep  2 18:14:05 host2 pengine: [1735]: notice: short_print:      Started: [ host2 ]
Sep  2 18:14:05 host2 pengine: [1735]: notice: short_print:      Stopped: [ Service:1 Service:2 ]
Sep  2 18:14:05 host2 pengine: [1735]: notice: clone_print:  Clone Set: PingSanClone [PingSan]
Sep  2 18:14:05 host2 pengine: [1735]: notice: short_print:      Started: [ host2 ]
Sep  2 18:14:05 host2 pengine: [1735]: notice: short_print:      Stopped: [ PingSan:0 PingSan:2 ]
Sep  2 18:14:05 host2 pengine: [1735]: notice: native_print: file#011(ocf::heartbeat:VirtualDomain) Started  FAILED
Sep  2 18:14:05 host2 pengine: [1735]: notice: native_print: #0111 : host2
Sep  2 18:14:05 host2 pengine: [1735]: notice: native_print: #0112 : host3
Sep  2 18:14:05 host2 pengine: [1735]: notice: RecurringOp:  Start recurring monitor (10s) for PingSan:0 on host3
Sep  2 18:14:05 host2 pengine: [1735]: ERROR: native_create_actions: Resource file (ocf::VirtualDomain) is active on 2 nodes attempting recovery
Sep  2 18:14:05 host2 pengine: [1735]: WARN: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
Sep  2 18:14:05 host2 pengine: [1735]: notice: RecurringOp:  Start recurring monitor (10s) for file on host2
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   Iscsi:0#011(Started host2)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   Virsh:0#011(Started host2)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   Iscsi:1#011(Stopped)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   Virsh:1#011(Stopped)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   Iscsi:2#011(Stopped)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   Virsh:2#011(Stopped)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Start   PingSan:0#011(host3)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   PingSan:1#011(Started host2)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Leave   PingSan:2#011(Stopped)
Sep  2 18:14:05 host2 pengine: [1735]: notice: LogActions: Recover file#011(Started host2)
Sep  2 18:14:05 host2 crmd: [1736]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Sep  2 18:14:05 host2 crmd: [1736]: info: unpack_graph: Unpacked transition 162: 10 actions in 10 synapses
Sep  2 18:14:05 host2 crmd: [1736]: info: do_te_invoke: Processing graph 162 (ref=pe_calc-dc-1314980045-947) derived from /var/lib/pengine/pe-error-111.bz2
Sep  2 18:14:05 host2 crmd: [1736]: info: te_pseudo_action: Pseudo action 34 fired and confirmed
Sep  2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 38: stop file_stop_0 on host2 (local)
Sep  2 18:14:05 host2 lrmd: [1733]: info: cancel_op: operation monitor[168] on file for client 1736, its parameters: CRM_meta_start_delay=[0] CRM_meta_timeout=[30000] CRM_meta_name=[monitor] crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/file.xml] CRM_meta_interval=[10000] hypervisor=[qemu:///system] migration_transport=[ssh]  cancelled
Sep  2 18:14:05 host2 crmd: [1736]: info: do_lrm_rsc_op: Performing key=38:162:0:4fd7da89-8f3a-4287-8e06-c274bb66aaa8 op=file_stop_0 )
Sep  2 18:14:05 host2 lrmd: [1733]: info: rsc:file stop[169] (pid 16205)
Sep  2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 5: stop file_stop_0 on host3
Sep  2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 9: probe_complete probe_complete on host3 - no waiting
Sep  2 18:14:05 host2 crmd: [1736]: info: process_lrm_event: LRM operation file_monitor_10000 (call=168, status=1, cib-update=0, confirmed=true) Cancelled
Sep  2 18:14:05 host2 crmd: [1736]: info: te_rsc_command: Initiating action 30: start PingSan:0_start_0 on host3
Sep  2 18:14:05 host2 VirtualDomain[16205]: INFO: Issuing graceful shutdown request for domain file.
Sep  2 18:14:05 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:05 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:05 host2 pengine: [1735]: ERROR: process_pe_message: Transition 162: ERRORs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-error-111.bz2
Sep  2 18:14:06 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:07 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:08 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:09 host2 crmd: [1736]: info: match_graph_event: Action PingSan:0_start_0 (30) confirmed on host3 (rc=0)
Sep  2 18:14:09 host2 crmd: [1736]: info: te_rsc_command: Initiating action 31: monitor PingSan:0_monitor_10000 on host3
Sep  2 18:14:09 host2 crmd: [1736]: info: te_pseudo_action: Pseudo action 35 fired and confirmed
Sep  2 18:14:09 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:10 host2 lrmd: [1733]: info: RA output: (file:stop:stdout) Domain file is being shutdown
Sep  2 18:14:11 host2 avahi-daemon[1756]: Interface vnet0.IPv6 no longer relevant for mDNS.
Sep  2 18:14:11 host2 avahi-daemon[1756]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fedc:bd82.
Sep  2 18:14:11 host2 avahi-daemon[1756]: Withdrawing address record for fe80::fc54:ff:fedc:bd82 on vnet0.
Sep  2 18:14:11 host2 kernel: [33341.324835] br0: port 2(vnet0) entering forwarding state
Sep  2 18:14:11 host2 avahi-daemon[1756]: Withdrawing workstation service for vnet0.
Sep  2 18:14:11 host2 kernel: [33341.327651] br0: port 2(vnet0) entering disabled state
Sep  2 18:14:11 host2 kernel: [33341.327778] device vnet0 left promiscuous mode
Sep  2 18:14:11 host2 kernel: [33341.327781] br0: port 2(vnet0) entering disabled state
Sep  2 18:14:12 host2 lrmd: [1733]: info: RA output: (file:stop:stderr) error: Failed to shutdown domain file
Sep  2 18:14:12 host2 lrmd: [1733]: info: RA output: (file:stop:stderr) error: operation failed: system shutdown operation failed
Sep  2 18:14:12 host2 lrmd: [1733]: info: operation stop[169] on file for client 1736: pid 16205 exited with return code 0
Sep  2 18:14:12 host2 crmd: [1736]: info: process_lrm_event: LRM operation file_stop_0 (call=169, rc=0, cib-update=984, confirmed=true) ok
Sep  2 18:14:12 host2 crmd: [1736]: info: match_graph_event: Action file_stop_0 (38) confirmed on host2 (rc=0)
Sep  2 18:14:13 host2 crmd: [1736]: info: match_graph_event: Action PingSan:0_monitor_10000 (31) confirmed on host3 (rc=0)
Sep  2 18:14:18 host2 crmd: [1736]: info: abort_transition_graph: te_update_diff:149 - Triggered transition abort (complete=0, tag=nvpair, id=status-host3-pingd-san, magic=NA, cib=0.192.50) : Transient attribute: update
Sep  2 18:14:18 host2 crmd: [1736]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000
Sep  2 18:14:18 host2 crmd: [1736]: info: update_abort_priority: Abort action done superceeded by restart

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] resource go in unmanaged mode when node startup

Reply via email to