On 1 Apr 2014, at 8:50 pm, Beo Banks <beo.ba...@googlemail.com> wrote:
> hi, > > the kvm guest are different kvm host. ok, so what I said below about "devices like fence_virsh and external/ssh are not considered reliable" definitely applies. perhaps add a second fencing resource that talks to the power switch the kvm hosts are connected to (so that the cluster can still recover if fence_virsh fails) > > > 2014-03-24 0:30 GMT+01:00 Andrew Beekhof <and...@beekhof.net>: > > On 21 Mar 2014, at 11:11 pm, Beo Banks <beo.ba...@googlemail.com> wrote: > > > yap and that´s my issue. > > > > stonith is very powerfull but how can the cluster handle hardware failure? > > by connecting to the switch that supplies power to said hardware > exactly the reason devices like fence_virsh and external/ssh are not > considered reliable. > > are both these VMs running on the same physical hardware? > > > > > primitive stonith-linux01 stonith:fence_virsh \ > > params pcmk_host_list="linux01" pcmk_host_check="dynamic-list" > > pcmk_host_map="linux01:linux01" action="reboot" ipaddr="XXXXXX" > > secure="true" login="root" identity_file="/root/.ssh/id_rsa" > > debug="/var/log/stonith.log" verbose="false" \ > > you dont need the host map if the name and value (name:value) are the same > > > op monitor interval="300s" \ > > op start interval="0" timeout="60s" \ > > meta failure-timeout="180s" > > primitive stonith-linux02 stonith:fence_virsh \ > > params pcmk_host_list="linux02" pcmk_host_check="dynamic-list" > > pcmk_host_map="linux02:linux02" action="reboot" ipaddr="XXXXX" > > secure="true" login="root" identity_file="/root/.ssh/id_rsa" delay="5" > > debug="/var/log/stonith.log" verbose="false" \ > > op monitor interval="60s" \ > > op start interval="0" timeout="60s" \ > > meta failure-timeout="180s" > > > > > > > > > > 2014-03-18 13:54 GMT+01:00 emmanuel segura <emi2f...@gmail.com>: > > do you have stonith configured? > > > > > > 2014-03-18 13:07 GMT+01:00 Alex Samad - Yieldbroker > > <alex.sa...@yieldbroker.com>: > > Im not expert but > > > > > > > > Current DC: linux02 - partition WITHOUT quorum > > Version: 1.1.10-14.el6_5.2-368c726 > > 2 Nodes configured, 2 expected votes > > > > > > > > > > I think your 2nd node can’t make quorum, there is some special config for 2 > > node cluster to allow nodes to make quorum with 1 vote.. > > > > > > > > A > > > > > > > > From: Beo Banks [mailto:beo.ba...@googlemail.com] > > Sent: Tuesday, 18 March 2014 10:06 PM > > To: pacemaker@oss.clusterlabs.org > > Subject: [Pacemaker] crm resource doesn´t move after hardware crash > > > > > > > > hi, > > > > i have a hardware crash in a two-node drbd cluster. > > > > the active node has a hardware failure is actual down. > > > > i am wondering that my 2nd doesn´t migrate/move the resource. > > > > the 2nd node want´s to fence the device but that´s not possible (it´s down) > > > > > > how can i enable the services on the last "good" node? > > > > and how can i optimize my config to handle that kind of error? > > > > crm status > > > > Last updated: Tue Mar 18 12:01:07 2014 > > Last change: Tue Mar 18 11:28:22 2014 via crmd on linux02 > > Stack: classic openais (with plugin) > > Current DC: linux02 - partition WITHOUT quorum > > Version: 1.1.10-14.el6_5.2-368c726 > > 2 Nodes configured, 2 expected votes > > 21 Resources configured > > > > > > Node linux01: UNCLEAN (offline) > > Online: [ linux02 ] > > > > Resource Group: mysql > > mysql_fs (ocf::heartbeat:Filesystem): Started linux01 > > mysql_ip (ocf::heartbeat:IPaddr2): Started linux01 > > > > .... and so on > > > > > > > > cluster.log > > > > > > Mar 18 11:54:43 [2234] linux02 crmd: notice: > > tengine_stonith_callback: Stonith operation 17 for linux01 failed > > (Timer expired): aborting transition. > > Mar 18 11:54:43 [2234] linux02 crmd: info: > > abort_transition_graph: tengine_stonith_callback:463 - Triggered > > transition abort (complete=0) : Stonith failed > > Mar 18 11:54:43 [2234] linux02 crmd: notice: run_graph: > > Transition 15 (Complete=9, Pending=0, Fired=0, Skipped=36, Incomplete=19, > > Source=/var/lib/pacemaker/pengine/pe-warn-63.bz2): Stopped > > Mar 18 11:54:43 [2234] linux02 crmd: notice: too_many_st_failures: > > Too many failures to fence linux01 (16), giving up > > Mar 18 11:54:43 [2234] linux02 crmd: info: do_log: FSA: > > Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE > > Mar 18 11:54:43 [2234] linux02 crmd: notice: do_state_transition: > > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL origin=notify_crmd ] > > Mar 18 11:54:43 [2230] linux02 stonith-ng: info: stonith_command: > > Processed st_notify reply from linux02: OK (0) > > Mar 18 11:54:43 [2234] linux02 crmd: notice: > > tengine_stonith_notify: Peer linux01 was not terminated (reboot) by > > linux02 for linux02: Timer expired > > (ref=7939b264-699c-4d00-a89c-07e7e0193a80) by client crmd.2234 > > Mar 18 11:54:44 [2229] linux02 cib: info: crm_client_new: > > Connecting 0x155ac00 for uid=0 gid=0 pid=23360 > > id=b88b2690-0c3f-48ac-b8b4-3a47b7f9114a > > Mar 18 11:54:44 [2229] linux02 cib: info: cib_process_request: > > Completed cib_query operation for section 'all': OK (rc=0, > > origin=local/crm_mon/2, version=0.125.2) > > Mar 18 11:54:44 [2229] linux02 cib: info: crm_client_destroy: > > Destroying 0 events > > Mar 18 11:55:03 [2229] linux02 cib: info: crm_client_new: > > Connecting 0x155ac00 for uid=0 gid=0 pid=23415 > > id=62e7a9d8-588e-427f-8178-85febce00151 > > Mar 18 11:55:03 [2229] linux02 cib: info: crm_client_new: > > Connecting 0x1585de0 for uid=0 gid=0 pid=23416 > > id=79795042-699b-4347-abcb-4c7c96ed2291 > > Mar 18 11:55:03 [2229] linux02 cib: info: cib_process_request: > > Completed cib_query operation for section nodes: OK (rc=0, > > origin=local/crm_attribute/2, version=0.125.2) > > Mar 18 11:55:03 [2229] linux02 cib: info: cib_process_request: > > Completed cib_query operation for section nodes: OK (rc=0, > > origin=local/crm_attribute/2, version=0.125.2) > > Mar 18 11:55:03 [2229] linux02 cib: info: crm_client_destroy: > > Destroying 0 events > > Mar 18 11:55:03 [2229] linux02 cib: info: crm_client_destroy: > > Destroying 0 events > > Mar 18 11:55:43 [2230] linux02 stonith-ng: error: remote_op_done: > > Already sent notifications for 'reboot of linux01 by linux02' > > (for=crmd.2234@linux02.7939b264, state=4): Timer expired > > Mar 18 11:55:59 [2229] linux02 cib: info: crm_client_new: > > Connecting 0x155ac00 for uid=0 gid=0 pid=23468 > > id=8dea3cab-9103-42fc-9747-76018c4a0500 > > Mar 18 11:55:59 [2229] linux02 cib: info: cib_process_request: > > Completed cib_query operation for section 'all': OK (rc=0, > > origin=local/crm_mon/2, version=0.125.2) > > Mar 18 11:55:59 [2229] linux02 cib: info: crm_client_destroy: > > Destroying 0 events > > Mar 18 11:56:03 [2229] linux02 cib: info: crm_client_new: > > Connecting 0x155ac00 for uid=0 gid=0 pid=23523 > > id=b681390a-51a3-4d68-abf1-514ee8ab9351 > > Mar 18 11:56:03 [2229] linux02 cib: info: crm_client_new: > > Connecting 0x1585de0 for uid=0 gid=0 pid=23524 > > id=005421e4-b079-4a16-b4cc-0fc2c8c73246 > > Mar 18 11:56:03 [2229] linux02 cib: info: cib_process_request: > > Completed cib_query operation for section nodes: OK (rc=0, > > origin=local/crm_attribute/2, version=0.125.2) > > Mar 18 11:56:03 [2229] linux02 cib: info: cib_process_request: > > Completed cib_query operation for section nodes: OK (rc=0, > > origin=local/crm_attribute/2, version=0.125.2) > > Mar 18 11:56:03 [2229] linux02 cib: info: crm_client_destroy: > > Destroying 0 events > > Mar 18 11:56:03 [2229] linux02 cib: info: crm_client_destroy: > > Destroying 0 events > > > > thanks > > > > beo > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > > -- > > esta es mi vida e me la vivo hasta que dios quiera > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org