Hi, On Fri, Jun 29, 2012 at 12:41:01PM -0700, Brett Lee wrote: > Hello -Am thinking that this is progress. > > Have made some updates, but still getting the same result ("require others to > stonith node st15-mds1"). > > > Referencing this link for the updates made: > http://www.hastexo.com/resources/hints-and-kinks/fencing-libvirtkvm-virtualized-cluster-nodes > > Updates include removing the previous 'primitive st-nodes' entry and adding > the following: > > > primitive stonith_st15-mds1 stonith:external/libvirt \ > params hostlist="st15-mds1" hypervisor_uri="qemu+ssh://wc0008/system" > stonith-timeout="30" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > op monitor interval="60" > primitive stonith_st15-mds2 stonith:external/libvirt \ > params hostlist="st15-mds2" hypervisor_uri="qemu+ssh://wc0008/system" > stonith-timeout="30" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > op monitor interval="60" > location l_stonith_st15-mds1 stonith_st15-mds1 -inf: st15-mds1 > location l_stonith_st15-mds2 stonith_st15-mds2 -inf: st15-mds2 > > Any suggestions would certainly be appreciated. Thanks!
This configuration is in no way better than the first one. Thanks, Dejan > > Brett Lee > Everything Penguin - http://etpenguin.com > > > > > >________________________________ > > From: Brett Lee <brett...@yahoo.com> > >To: "pacemaker@oss.clusterlabs.org" <pacemaker@oss.clusterlabs.org> > >Sent: Friday, June 29, 2012 9:43 AM > >Subject: [Pacemaker] newb - stonith not working - require others to stonith > >node > > > > > >Hi - > > > > > > > >Am new to pacemaker and now have a shiny new configuration that will not > >stonith. This is a test system using KVM and external/libvirt - all VMs are > >running CentOS 5. > > > >Am (really) hoping someone might be willing to help troubleshoot this > >configuration. Thank you for your > time and effort! > > > > > > > >The items that are suspect to me are: > >1. st-nodes has no 'location' entry > >2. logs report node_list= > >3. resource st-nodes is Stopped > > > >Have attached a clip of the configuration below. The full configuration and > >log file may be found at - http://pastebin.com/bS87FXUr > > > > > >Per 'stonith -t external/libvirt -h' I have configured stonith using: > > > > > >primitive st-nodes stonith:external/libvirt \ > > params hostlist="st15-mds1,st15-mds2,st15-oss1,st15-oss2" > >hypervisor_uri="qemu+ssh://wc0008/system" stonith-timeout="30" \ > > op start interval="0" timeout="60" > \ > > op stop interval="0" timeout="60" \ > > op monitor interval="60" > > > >And a section of the log file is: > > > > > >Jun 29 11:02:07 st15-mds2 stonithd: [4485]: ERROR: Failed to STONITH the > >node st15-mds1: optype=RESET, op_result=TIMEOUT > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: tengine_stonith_callback: > >call=-65, optype=1, node_name=st15-mds1, result=2, node_list=, > >action=23:90:0:aac961e7-b06b-4dfd-ae60-c882407b16b5 > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: ERROR: tengine_stonith_callback: > >Stonith of st15-mds1 failed > (2)... aborting transition. > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: abort_transition_graph: > >tengine_stonith_callback:409 - Triggered transition abort (complete=0) : > >Stonith failed > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: update_abort_priority: Abort > >priority upgraded from 0 to 1000000 > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: update_abort_priority: Abort > action done superceeded by restart > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: run_graph: > >==================================================== > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: notice: run_graph: Transition 90 > >(Complete=2, Pending=0, Fired=0, Skipped=5, Incomplete=0, > >Source=/var/lib/pengine/pe-warn-173.bz2): Stopped > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_graph_trigger: Transition > >90 is now complete > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: State > >transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC > >cause=C_FSA_INTERNAL origin=notify_crmd ] > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: All 3 > >cluster nodes are eligible to run resources. > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_pe_invoke: Query 299: > >Requesting the current CIB: S_POLICY_ENGINE > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_pe_invoke_callback: > >Invoking the PE: query=299, > ref=pe_calc-dc-1340982127-223, seq=396, quorate=1 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: unpack_config: Node scores: > >'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: > >Node st15-mds2 is online > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: pe_fence_node: Node > >st15-mds1 will be fenced because it is un-expectedly down > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: > >determine_online_status_fencing: ha_state=active, ccm_state=false, > >crm_state=online, join_state=member, expected=member > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: determine_online_status: > >Node st15-mds1 is unclean > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: > >Node st15-oss1 is online > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: > >Node st15-oss2 is online > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: > native_print: lustre-OST0000 (ocf::heartbeat:Filesystem): Started > st15-oss1 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: > >lustre-OST0001 (ocf::heartbeat:Filesystem): Started st15-oss1 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: > >lustre-OST0002 (ocf::heartbeat:Filesystem): Started st15-oss2 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: > >lustre-OST0003 (ocf::heartbeat:Filesystem): Started st15-oss2 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: > >lustre-MDT0000 (ocf::heartbeat:Filesystem): Started st15-mds1 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: st-nodes > >(stonith:external/libvirt): Stopped > >Jun 29 11:02:07 st15-mds2 pengine: > [4489]: info: native_color: Resource st-nodes cannot run anywhere > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: custom_action: Action > >lustre-MDT0000_stop_0 on st15-mds1 is unrunnable (offline) > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: custom_action: Marking node > >st15-mds1 unclean > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: RecurringOp: Start > >recurring monitor (120s) for lustre-MDT0000 on st15-mds2 > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: stage6: Scheduling Node > >st15-mds1 for STONITH > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: native_stop_constraints: > >lustre-MDT0000_stop_0 is implicit after st15-mds1 is fenced > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave > >resource lustre-OST0000 (Started st15-oss1) > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave > >resource lustre-OST0001 (Started st15-oss1) > >Jun > 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave resource > lustre-OST0002 (Started st15-oss2) > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave > >resource lustre-OST0003 (Started st15-oss2) > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Move > >resource lustre-MDT0000 (Started st15-mds1 -> st15-mds2) > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave > >resource st-nodes (Stopped) > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: State > >transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > >cause=C_IPC_MESSAGE origin=handle_response ] > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: process_pe_message: > >Transition 91: WARNINGs found during PE processing. PEngine Input stored in: > >/var/lib/pengine/pe-warn-174.bz2 > >Jun > 29 11:02:07 st15-mds2 crmd: [4490]: info: unpack_graph: Unpacked transition > 91: 7 actions in 7 synapses > >Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: process_pe_message: > >Configuration WARNINGs found during PE processing. Please run "crm_verify > >-L" to identify issues. > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_te_invoke: Processing graph > >91 (ref=pe_calc-dc-1340982127-223) derived from > >/var/lib/pengine/pe-warn-174.bz2 > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_pseudo_action: Pseudo > >action 21 fired and confirmed > >Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_fence_node: Executing > >reboot fencing operation (23) on st15-mds1 (timeout=60000) > >Jun 29 11:02:07 st15-mds2 stonithd: [4485]: info: client tengine [pid: 4490] > >requests a STONITH operation RESET on node st15-mds1 > >Jun 29 11:02:07 st15-mds2 stonithd: [4485]: info: we can't manage st15-mds1, > >broadcast request to other nodes > >Jun 29 11:02:07 st15-mds2 stonithd: > [4485]: info: Broadcasting the message succeeded: require others to stonith > node st15-mds1. > > > >Thank you! > > > > > >Brett Lee > >Everything Penguin - http://etpenguin.com > > > >_______________________________________________ > >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > >Project Home: http://www.clusterlabs.org > >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >Bugs: http://bugs.clusterlabs.org > > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org