Hi Andrew, Thank you for comment.
> >> > >> 1)It is necessary for the manager of the system to cope when rc is > >> 6(fatal) log. > >> 2)And it is necessary for this to be reflected by a document. > > No to both. All right. > >> And does it mean that the next log should not be output until a system > >> administrator controls it? > >> > >> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX > >> was active at shutdown. You may ignore this error if it is unmanaged. > > > > Right. There was actually a third part... a slightly more restrictive > > version of your original patch: > > https://github.com/beekhof/pacemaker/commit/543ee8e I confirmed it. Many Thanks!! Hideo Yamauchi. > >> --- On Tue, 2012/2/21, Andrew Beekhof <and...@beekhof.net> wrote: > >> > >>> On Fri, Feb 17, 2012 at 10:49 AM, <renayama19661...@ybb.ne.jp> wrote: > >>> > Hi Andrew, > >>> > > >>> > Thank you for comment. > >>> > > >>> >> I'm getting to this soon, really :-) > >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17 > >>> >> works, then fixing everything I broke when adding corosync 2.0 > >>> >> support. > >>> > > >>> > All right! > >>> > > >>> > I wait for your answer. > >>> > >>> I somehow missed that the failure was "not configured" > >>> > >>> Failed actions: > >>> prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not > >>> configured > >>> > >>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html > >>> lists rc=6 as fatal, but I believe we changed that behaviour (the > >>> stopping aspect) in the PE as there was also insufficient information > >>> for the agent to stop the service. > >>> Which results in the node being fenced, the resource being probed, > >>> which fails along with the subsequent stop, then the node is fenced > >>> again, etc. > >>> > >>> So two things: > >>> > >>> this log message should include the human version of rc=6 > >>> Jan 6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard > >>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from > >>> re-starting anywhere in the cluster > >>> > >>> and the docs need to be updated. > >>> > >>> > > >>> > Best Regards, > >>> > Hideo Yamauchi. > >>> > > >>> > --- On Thu, 2012/2/16, Andrew Beekhof <and...@beekhof.net> wrote: > >>> > > >>> >> Sorry! > >>> >> > >>> >> I'm getting to this soon, really :-) > >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17 > >>> >> works, then fixing everything I broke when adding corosync 2.0 > >>> >> support. > >>> >> > >>> >> On Tue, Feb 14, 2012 at 11:20 AM, <renayama19661...@ybb.ne.jp> wrote: > >>> >> > Hi Andrew, > >>> >> > > >>> >> > About this problem, how did it turn out afterwards? > >>> >> > > >>> >> > Best Regards, > >>> >> > Hideo Yamauchi. > >>> >> > > >>> >> > > >>> >> > --- On Mon, 2012/1/16, renayama19661...@ybb.ne.jp > >>> >> > <renayama19661...@ybb.ne.jp> wrote: > >>> >> > > >>> >> >> Hi Andrew, > >>> >> >> > >>> >> >> Thank you for comments. > >>> >> >> > >>> >> >> > Could you send me the PE file related to this log please? > >>> >> >> > > >>> >> >> > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: > >>> >> >> > Processing > >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from > >>> >> >> > /var/lib/pengine/pe-input-4.bz2 > >>> >> >> > >>> >> >> The old file disappeared. > >>> >> >> I send log and the PE file which reappeared in the same procedure. > >>> >> >> > >>> >> >> * trac1818.zip > >>> >> >> * > >>> >> >> https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127 > >>> >> >> > >>> >> >> Best Regards, > >>> >> >> Hideo Yamauchi. > >>> >> >> > >>> >> >> > >>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <and...@beekhof.net> wrote: > >>> >> >> > >>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM, <renayama19661...@ybb.ne.jp> > >>> >> >> > wrote: > >>> >> >> > > Hi Andrew, > >>> >> >> > > > >>> >> >> > > Thank you for comment. > >>> >> >> > > > >>> >> >> > >> But it should have a subsequent stop action which would set it > >>> >> >> > >> back to > >>> >> >> > >> being inactive. > >>> >> >> > >> Did that not happen in this case? > >>> >> >> > > > >>> >> >> > > Yes. > >>> >> >> > > >>> >> >> > Could you send me the PE file related to this log please? > >>> >> >> > > >>> >> >> > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: > >>> >> >> > Processing > >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from > >>> >> >> > /var/lib/pengine/pe-input-4.bz2 > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > Log of "verify_stopped" is only recorded. > >>> >> >> > > The stop handling of resource that failed in probe was not > >>> >> >> > > carried out. > >>> >> >> > > > >>> >> >> > > ----------------------------- > >>> >> >> > > ######### yamauchi PREV STOP ########## > >>> >> >> > > Jan 6 19:21:56 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15 > >>> >> >> > > Jan 6 19:21:56 rh57-1 ifcheckd: [3462]: info: > >>> >> >> > > crm_signal_dispatch: Invoking handler for signal 15: Terminated > >>> >> >> > > Jan 6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: > >>> >> >> > > Requesting the list of configured nodes > >>> >> >> > > Jan 6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting > >>> >> >> > > ifcheckd > >>> >> >> > > Jan 6 19:21:58 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/crmd process group 3461 with signal 15 > >>> >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: > >>> >> >> > > Invoking handler for signal 15: Terminated > >>> >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: > >>> >> >> > > Requesting shutdown > >>> >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: > >>> >> >> > > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN > >>> >> >> > > cause=C_SHUTDOWN origin=crm_shutdown ] > >>> >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: > >>> >> >> > > All 1 cluster nodes are eligible to run resources. > >>> >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: > >>> >> >> > > Sending shutdown request to DC: rh57-1 > >>> >> >> > > Jan 6 19:21:59 rh57-1 crmd: [3461]: info: > >>> >> >> > > handle_shutdown_request: Creating shutdown request for rh57-1 > >>> >> >> > > (state=S_POLICY_ENGINE) > >>> >> >> > > Jan 6 19:21:59 rh57-1 attrd: [3460]: info: > >>> >> >> > > attrd_trigger_update: Sending flush op to all hosts for: > >>> >> >> > > shutdown (1325845319) > >>> >> >> > > Jan 6 19:21:59 rh57-1 attrd: [3460]: info: > >>> >> >> > > attrd_perform_update: Sent update 14: shutdown=1325845319 > >>> >> >> > > Jan 6 19:21:59 rh57-1 crmd: [3461]: info: > >>> >> >> > > abort_transition_graph: te_update_diff:150 - Triggered > >>> >> >> > > transition abort (complete=1, tag=nvpair, > >>> >> >> > > id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, > >>> >> >> > > name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : > >>> >> >> > > Transient attribute: update > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: > >>> >> >> > > New Transition Timer (I_PE_CALC) just popped! > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query > >>> >> >> > > 44: Requesting the current CIB: S_POLICY_ENGINE > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: > >>> >> >> > > do_pe_invoke_callback: Invoking the PE: query=44, > >>> >> >> > > ref=pe_calc-dc-1325845321-26, seq=1, quorate=0 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: > >>> >> >> > > On loss of CCM Quorum: Ignore > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: > >>> >> >> > > Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: > >>> >> >> > > Blind faith: not fencing unseen nodes > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: > >>> >> >> > > determine_online_status: Node rh57-1 is shutting down > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: > >>> >> >> > > Hard error - prmVIP_monitor_0 failed with rc=6: Preventing > >>> >> >> > > prmVIP from re-starting anywhere in the cluster > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: > >>> >> >> > > Resource Group: grpUltraMonkey > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: > >>> >> >> > > prmVIP (ocf::heartbeat:LVM): Stopped > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: > >>> >> >> > > Resource Group: grpStonith1 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: > >>> >> >> > > prmStonith1-2 (stonith:external/ssh): Stopped > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: > >>> >> >> > > prmStonith1-3 (stonith:meatware): Stopped > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: > >>> >> >> > > Resource Group: grpStonith2 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: > >>> >> >> > > prmStonith2-2 (stonith:external/ssh): Started rh57-1 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: > >>> >> >> > > prmStonith2-3 (stonith:meatware): Started rh57-1 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print: > >>> >> >> > > Clone Set: clnPingd > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: short_print: > >>> >> >> > > Started: [ rh57-1 ] > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: > >>> >> >> > > rsc_merge_weights: clnPingd: Rolling back scores from prmVIP > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: > >>> >> >> > > Resource prmPingd:0 cannot run anywhere > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: > >>> >> >> > > Resource prmVIP cannot run anywhere > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: > >>> >> >> > > rsc_merge_weights: prmStonith1-2: Rolling back scores from > >>> >> >> > > prmStonith1-3 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: > >>> >> >> > > Resource prmStonith1-2 cannot run anywhere > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: > >>> >> >> > > Resource prmStonith1-3 cannot run anywhere > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: > >>> >> >> > > rsc_merge_weights: prmStonith2-2: Rolling back scores from > >>> >> >> > > prmStonith2-3 > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: > >>> >> >> > > Resource prmStonith2-2 cannot run anywhere > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: > >>> >> >> > > Resource prmStonith2-3 cannot run anywhere > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: stage6: > >>> >> >> > > Scheduling Node rh57-1 for shutdown > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: > >>> >> >> > > Leave resource prmVIP (Stopped) > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: > >>> >> >> > > Leave resource prmStonith1-2 (Stopped) > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: > >>> >> >> > > Leave resource prmStonith1-3 (Stopped) > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: > >>> >> >> > > Stop resource prmStonith2-2 (rh57-1) > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: > >>> >> >> > > Stop resource prmStonith2-3 (rh57-1) > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: > >>> >> >> > > Stop resource prmPingd:0 (rh57-1) > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: > >>> >> >> > > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ > >>> >> >> > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > >>> >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: > >>> >> >> > > process_pe_message: Transition 4: PEngine Input stored in: > >>> >> >> > > /var/lib/pengine/pe-input-4.bz2 > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: > >>> >> >> > > Unpacked transition 4: 9 actions in 9 synapses > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: > >>> >> >> > > Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from > >>> >> >> > > /var/lib/pengine/pe-input-4.bz2 > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: > >>> >> >> > > Pseudo action 19 fired and confirmed > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: > >>> >> >> > > Pseudo action 24 fired and confirmed > >>> >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: > >>> >> >> > > Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation > >>> >> >> > > monitor[10] on prmPingd:0 for client 3461, its parameters: > >>> >> >> > > CRM_meta_interval=[10000] multiplier=[100] > >>> >> >> > > CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] > >>> >> >> > > name=[default_ping_set] CRM_meta_clone_max=[1] > >>> >> >> > > crm_feature_set=[3.0.1] host_list=[192.168.40.1] > >>> >> >> > > CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] > >>> >> >> > > CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] > >>> >> >> > > CRM_meta_notify=[false] cancelled > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: > >>> >> >> > > Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 > >>> >> >> > > op=prmPingd:0_stop_0 ) > >>> >> >> > > Jan 6 19:22:02 rh57-1 pingd: [3529]: info: > >>> >> >> > > crm_signal_dispatch: Invoking handler for signal 15: Terminated > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 > >>> >> >> > > stop[14] (pid 3612) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] > >>> >> >> > > on prmPingd:0 for client 3461: pid 3612 exited with return code > >>> >> >> > > 0 > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: > >>> >> >> > > LRM operation prmPingd:0_monitor_10000 (call=10, status=1, > >>> >> >> > > cib-update=0, confirmed=true) Cancelled > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: > >>> >> >> > > LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, > >>> >> >> > > confirmed=true) ok > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: > >>> >> >> > > Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0) > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: > >>> >> >> > > Pseudo action 25 fired and confirmed > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: > >>> >> >> > > Pseudo action 4 fired and confirmed > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: > >>> >> >> > > Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 > >>> >> >> > > (local) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation > >>> >> >> > > monitor[13] on prmStonith2-3 for client 3461, its parameters: > >>> >> >> > > CRM_meta_interval=[3600000] stonith-timeout=[600s] > >>> >> >> > > hostlist=[rh57-2] CRM_meta_timeout=[60000] > >>> >> >> > > crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor] > >>> >> >> > > cancelled > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: > >>> >> >> > > Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 > >>> >> >> > > op=prmStonith2-3_stop_0 ) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 > >>> >> >> > > stop[15] (pid 3617) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH > >>> >> >> > > resource <rsc_id=prmStonith2-3> : Device=meatware > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: > >>> >> >> > > LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, > >>> >> >> > > cib-update=0, confirmed=true) Cancelled > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] > >>> >> >> > > on prmStonith2-3 for client 3461: pid 3617 exited with return > >>> >> >> > > code 0 > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: > >>> >> >> > > LRM operation prmStonith2-3_stop_0 (call=15, rc=0, > >>> >> >> > > cib-update=46, confirmed=true) ok > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: > >>> >> >> > > Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0) > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: > >>> >> >> > > Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 > >>> >> >> > > (local) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation > >>> >> >> > > monitor[11] on prmStonith2-2 for client 3461, its parameters: > >>> >> >> > > CRM_meta_interval=[3600000] stonith-timeout=[60s] > >>> >> >> > > hostlist=[rh57-2] CRM_meta_timeout=[60000] > >>> >> >> > > crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor] > >>> >> >> > > cancelled > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: > >>> >> >> > > Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 > >>> >> >> > > op=prmStonith2-2_stop_0 ) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 > >>> >> >> > > stop[16] (pid 3619) > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH > >>> >> >> > > resource <rsc_id=prmStonith2-2> : Device=external/ssh > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: > >>> >> >> > > LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, > >>> >> >> > > cib-update=0, confirmed=true) Cancelled > >>> >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] > >>> >> >> > > on prmStonith2-2 for client 3461: pid 3619 exited with return > >>> >> >> > > code 0 > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: > >>> >> >> > > LRM operation prmStonith2-2_stop_0 (call=16, rc=0, > >>> >> >> > > cib-update=47, confirmed=true) ok > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: > >>> >> >> > > Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0) > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: > >>> >> >> > > Pseudo action 20 fired and confirmed > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: > >>> >> >> > > Executing crm-event (28): do_shutdown on rh57-1 > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: > >>> >> >> > > crm-event (28) is a local shutdown > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: > >>> >> >> > > ==================================================== > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: > >>> >> >> > > Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, > >>> >> >> > > Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete > >>> >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: > >>> >> >> > > Transition 4 is now complete > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: > >>> >> >> > > State transition S_TRANSITION_ENGINE -> S_STOPPING [ > >>> >> >> > > input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ] > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC > >>> >> >> > > role released > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent > >>> >> >> > > -TERM to pengine: [3464] > >>> >> >> > > Jan 6 19:22:03 rh57-1 pengine: [3464]: info: > >>> >> >> > > crm_signal_dispatch: Invoking handler for signal 15: Terminated > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: > >>> >> >> > > Transitioner is now inactive > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: > >>> >> >> > > Disconnecting STONITH... > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: > >>> >> >> > > tengine_stonith_connection_destroy: Fencing daemon disconnected > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently > >>> >> >> > > connected. > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: > >>> >> >> > > Terminating the pengine > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent > >>> >> >> > > -TERM to pengine: [3464] > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting > >>> >> >> > > for subsystems to exit > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: > >>> >> >> > > register_fsa_input_adv: do_shutdown stalled the FSA with > >>> >> >> > > pending inputs > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All > >>> >> >> > > subsystems stopped, continuing > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input > >>> >> >> > > I_RELEASE_SUCCESS from do_dc_release() received in state > >>> >> >> > > S_STOPPING > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: > >>> >> >> > > Terminating the pengine > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent > >>> >> >> > > -TERM to pengine: [3464] > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting > >>> >> >> > > for subsystems to exit > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All > >>> >> >> > > subsystems stopped, continuing > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: > >>> >> >> > > Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) > >>> >> >> > > before being called (GSource: 0x179d9b0) > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: > >>> >> >> > > started at 429442052 should have started at 429442010 > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: > >>> >> >> > > crmdManagedChildDied: Process pengine:[3464] exited (signal=0, > >>> >> >> > > exitcode=0) > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: > >>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 80 ms > >>> >> >> > > (> 30 ms) (GSource: 0x179d9b0) > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: > >>> >> >> > > Received HUP from pengine:[3464] > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: > >>> >> >> > > pe_connection_destroy: Connection to the Policy Engine released > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All > >>> >> >> > > subsystems stopped, continuing > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: > >>> >> >> > > Resource prmVIP was active at shutdown. You may ignore this > >>> >> >> > > error if it is unmanaged. > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: > >>> >> >> > > Disconnected from the LRM > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: > >>> >> >> > > Disconnected from Heartbeat > >>> >> >> > > Jan 6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) > >>> >> >> > > removed from ccm > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: > >>> >> >> > > Disconnecting CIB > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: > >>> >> >> > > crmd_cib_connection_destroy: Connection to the CIB terminated... > >>> >> >> > > Jan 6 19:22:03 rh57-1 cib: [3457]: info: > >>> >> >> > > cib_process_readwrite: We are now in R/O mode > >>> >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing > >>> >> >> > > A_EXIT_0 - gracefully exiting the CRMd > >>> >> >> > > Jan 6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC > >>> >> >> > > Channel to 3461 is not connected > >>> >> >> > > Jan 6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping > >>> >> >> > > I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL > >>> >> >> > > origin=do_stop ] > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: WARN: > >>> >> >> > > send_via_callback_channel: Delivery of reply to client > >>> >> >> > > 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed > >>> >> >> > > Jan 6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] > >>> >> >> > > stopped (0) > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: > >>> >> >> > > A-Sync reply to crmd failed: reply failed > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/attrd process group 3460 with signal 15 > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: > >>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 50 ms > >>> >> >> > > (> 30 ms) (GSource: 0x7b28140) > >>> >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: > >>> >> >> > > crm_signal_dispatch: Invoking handler for signal 15: Terminated > >>> >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: > >>> >> >> > > Exiting > >>> >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting... > >>> >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: > >>> >> >> > > attrd_cib_connection_destroy: Connection to the CIB > >>> >> >> > > terminated... > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/stonithd process group 3459 with signal 15 > >>> >> >> > > Jan 6 19:22:04 rh57-1 stonithd: [3459]: notice: > >>> >> >> > > /usr/lib64/heartbeat/stonithd normally quit. > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15 > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: > >>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 40 ms > >>> >> >> > > (> 30 ms) (GSource: 0x7b28140) > >>> >> >> > > Jan 6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/cib process group 3457 with signal 15 > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: > >>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 40 ms > >>> >> >> > > (> 30 ms) (GSource: 0x7b28140) > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: > >>> >> >> > > Invoking handler for signal 15: Terminated > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: > >>> >> >> > > Disconnected 0 clients > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: > >>> >> >> > > cib_process_disconnect: All clients disconnected... > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: > >>> >> >> > > initiate_exit: Disconnecting heartbeat > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: > >>> >> >> > > Exiting... > >>> >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: main: Done > >>> >> >> > > Jan 6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) > >>> >> >> > > removed from ccm > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing > >>> >> >> > > /usr/lib64/heartbeat/ccm process group 3456 with signal 15 > >>> >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: > >>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 60 ms > >>> >> >> > > (> 30 ms) (GSource: 0x7b28140) > >>> >> >> > > Jan 6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, > >>> >> >> > > going to shut down > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO > >>> >> >> > > process 3446 with signal 15 > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE > >>> >> >> > > process 3447 with signal 15 > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD > >>> >> >> > > process 3448 with signal 15 > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE > >>> >> >> > > process 3449 with signal 15 > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD > >>> >> >> > > process 3450 with signal 15 > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process > >>> >> >> > > 3448 exited. 5 remaining > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process > >>> >> >> > > 3447 exited. 4 remaining > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process > >>> >> >> > > 3450 exited. 3 remaining > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process > >>> >> >> > > 3446 exited. 2 remaining > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process > >>> >> >> > > 3449 exited. 1 remaining > >>> >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 > >>> >> >> > > Heartbeat shutdown complete. > >>> >> >> > > > >>> >> >> > > ----------------------------- > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > Best Regards, > >>> >> >> > > Hideo Yamauchi. > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <and...@beekhof.net> wrote: > >>> >> >> > > > >>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM, <renayama19661...@ybb.ne.jp> > >>> >> >> > >> wrote: > >>> >> >> > >> > Hi All, > >>> >> >> > >> > > >>> >> >> > >> > When Pacemaker stops when there is the resource that failed > >>> >> >> > >> > in probe processing, crmd outputs the following error > >>> >> >> > >> > message. > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: > >>> >> >> > >> > Resource XXXXX was active at shutdown. You may ignore this > >>> >> >> > >> > error if it is unmanaged. > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > Because the resource that failed in probe processing does > >>> >> >> > >> > not start, > >>> >> >> > >> > >>> >> >> > >> But it should have a subsequent stop action which would set it > >>> >> >> > >> back to > >>> >> >> > >> being inactive. > >>> >> >> > >> Did that not happen in this case? > >>> >> >> > >> > >>> >> >> > >> > this error message is not right. > >>> >> >> > >> > > >>> >> >> > >> > I think that the following correction may be good, but we do > >>> >> >> > >> > not have conviction. > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > * crmd/lrm.c > >>> >> >> > >> > (snip) > >>> >> >> > >> > } else if(op->rc == EXECRA_NOT_RUNNING) { > >>> >> >> > >> > active = FALSE; > >>> >> >> > >> > + } else if(op->rc != EXECRA_OK && > >>> >> >> > >> > op->interval == 0 > >>> >> >> > >> > + && safe_str_eq(op->op_type, > >>> >> >> > >> > CRMD_ACTION_STATUS)) { > >>> >> >> > >> > + active = FALSE; > >>> >> >> > >> > } else { > >>> >> >> > >> > active = TRUE; > >>> >> >> > >> > } > >>> >> >> > >> > (snip) > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > In the source for development of Pacemaker, handling of this > >>> >> >> > >> > processing seems to be considerably changed. > >>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this > >>> >> >> > >> > change that we can do it. > >>> >> >> > >> > > >>> >> >> > >> > Best Regards, > >>> >> >> > >> > Hideo Yamauchi. > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > _______________________________________________ > >>> >> >> > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> >> >> > >> > > >>> >> >> > >> > Project Home: http://www.clusterlabs.org > >>> >> >> > >> > Getting started: > >>> >> >> > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> >> >> > >> > Bugs: http://bugs.clusterlabs.org > >>> >> >> > >> > >>> >> >> > > > >>> >> >> > > _______________________________________________ > >>> >> >> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> >> >> > > > >>> >> >> > > Project Home: http://www.clusterlabs.org > >>> >> >> > > Getting started: > >>> >> >> > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> >> >> > > Bugs: http://bugs.clusterlabs.org > >>> >> >> > > >>> >> >> > >>> >> > > >>> >> > _______________________________________________ > >>> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> >> > > >>> >> > Project Home: http://www.clusterlabs.org > >>> >> > Getting started: > >>> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> >> > Bugs: http://bugs.clusterlabs.org > >>> >> > >>> > > >>> > _______________________________________________ > >>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > > >>> > Project Home: http://www.clusterlabs.org > >>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> > Bugs: http://bugs.clusterlabs.org > >>> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org