On Tue, Feb 15, 2011 at 9:32 AM, <renayama19661...@ybb.ne.jp> wrote: > Hi all, > > We test trouble at the time of the start of the Master/Slave resource. > > Step1) We start the first node and send cib. > > ============ > Last updated: Thu Feb 10 16:32:12 2011 > Stack: Heartbeat > Current DC: srv01 (c7435833-8bc5-43aa-8195-c666b818677f) - partition with > quorum > Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd > 1 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > > Online: [ srv01 ] > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started srv01 > Resource Group: grpStonith2 > prmStonith2-2 (stonith:external/ssh): Started srv01 > prmStonith2-3 (stonith:meatware): Started srv01 > Master/Slave Set: msPostgreSQLDB > Masters: [ srv01 ] > Stopped: [ prmApPostgreSQLDB:1 ] > Clone Set: clnPingd > Started: [ srv01 ] > Stopped: [ prmPingd:1 ] > > Migration summary: > * Node srv01: > > > Step2) We change Stateful of the node of the second of them. > (snip) > stateful_start() { > ocf_log info "Start of Stateful." > sleep 120 ----> add sleep. > stateful_check_state master > (snip) > > Step3) We start a node of the second of them. > > Step4) We confirm "sleep" and reboot a node of the second of them. > [root@srv02 ~]# ps -ef |grep sleep > > Step5) The node of the first of them detects the disappearance of the node of > the second of them. > * But, STONITH is delayed because "post_notify_start_0" of the node of the > second of them is carried > out.
Wait, what? Why would post_notify_start_0 of prmApPostgreSQLDB block stonith? You didn't put a stonith resource in an ordering constraint did you? > * In the srv02 node that disappeared, we do not need the practice of > post_notify_start_0. > * STONITH should be carried out immediately.(STONITH is kept waiting to > time-out of > post_notify_start_0 now.) > > > --------------(snip)---------------- > Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: NEW MEMBERSHIP: > trans=3, nodes=1, new=0, > lost=1 n_idx=0, new_idx=1, old_idx=3 > Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: CURRENT: > srv01 [nodeid=0, born=3] > Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: LOST: > srv02 [nodeid=1, born=2] > Feb 10 16:33:18 srv01 crmd: [4293]: info: ais_status_callback: status: srv02 > is now lost (was member) > Feb 10 16:33:18 srv01 crmd: [4293]: info: crm_update_peer: Node srv02: id=1 > state=lost (new) > addr=(null) votes=-1 born=2 seen=2 proc=00000000000000000000000000000200 > Feb 10 16:33:18 srv01 crmd: [4293]: info: erase_node_from_join: Removed node > srv02 from join > calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 > Feb 10 16:33:18 srv01 crmd: [4293]: info: populate_cib_nodes_ha: Requesting > the list of configured > nodes > Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 36 > fired and confirmed > Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 39 > fired and confirmed > Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action > 75: notify > prmApPostgreSQLDB:0_post_notify_start_0 on srv01 (local) > Feb 10 16:33:19 srv01 crmd: [4293]: info: do_lrm_rsc_op: Performing > key=75:7:0:6918f8dc-fe1a-4c28-8aff-e8ac7a5e7143 > op=prmApPostgreSQLDB:0_notify_0 ) > Feb 10 16:33:19 srv01 lrmd: [4290]: info: rsc:prmApPostgreSQLDB:0:24: notify > Feb 10 16:33:19 srv01 cib: [4289]: info: cib_process_request: Operation > complete: op cib_modify for > section nodes (origin=local/crmd/99, version=0.9.22): ok (rc=0) > Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action > 76: notify > prmApPostgreSQLDB:1_post_notify_start_0 on srv02 > Feb 10 16:33:19 srv01 lrmd: [4290]: info: RA output: > (prmApPostgreSQLDB:0:notify:stdout) usage: > /usr/lib/ocf/resource.d//pacemaker/Stateful > {start|stop|promote|demote|monitor|validate-all|meta-data} > Expects to have a fully populated OCF RA-compliant environment set. > Feb 10 16:33:19 srv01 crmd: [4293]: info: process_lrm_event: LRM operation > prmApPostgreSQLDB:0_notify_0 (call=24, rc=0, cib-update=101, confirmed=true) > ok > Feb 10 16:33:19 srv01 crmd: [4293]: info: match_graph_event: Action > prmApPostgreSQLDB:0_post_notify_start_0 (75) confirmed on srv01 (rc=0) > Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 40 > fired and confirmed > Feb 10 16:34:39 srv01 crmd: [4293]: WARN: action_timer_callback: Timer popped > (timeout=20000, > abort_level=1000000, complete=false) > Feb 10 16:34:39 srv01 crmd: [4293]: ERROR: print_elem: Aborting transition, > action lost: [Action 76]: > Failed (id: prmApPostgreSQLDB:1_post_notify_start_0, loc: srv02, priority: > 1000000) > Feb 10 16:34:39 srv01 crmd: [4293]: info: abort_transition_graph: > action_timer_callback:486 - > Triggered transition abort (complete=0) : Action lost > Feb 10 16:34:39 srv01 crmd: [4293]: WARN: cib_action_update: rsc_op 76: > prmApPostgreSQLDB:1_post_notify_start_0 on srv02 timed out > Feb 10 16:34:39 srv01 crmd: [4293]: info: run_graph: > ==================================================== > Feb 10 16:34:39 srv01 crmd: [4293]: notice: run_graph: Transition 7 > (Complete=16, Pending=0, Fired=0, > Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-7.bz2): Stopped > Feb 10 16:34:39 srv01 crmd: [4293]: info: te_graph_trigger: Transition 7 is > now complete > Feb 10 16:34:39 srv01 crmd: [4293]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> > S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] > ---------------(snip)------------------ > > Will not this movement be bug? > > * I registered the log with Bugzilla. > * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2559 > > Best Regards, > Hideo Yamauchi. > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker