Hi Andrew. You are right. I renamed vmware machines to have the name in lowercase and it worked. I tested also dash and bracket [ . With unusual characters I mentioned stonith failed. However my vmware machine gets infinitely rebooted ower and ower. I've found somewhere on the web this was a bug of the pacemaker 1.1.7 and was fixed in the version 1.1.8 . I will try to compile pacemaker source to have the newest version.
Thank you. Best regards, Michal Mistina On 18/07/2013, at 10:46 PM, Mistina Michal <michal.mist...@virte.sk> wrote: > Hi Andrew. > Thank you for a little insight. I tried to set higher timout limits > within fence_vmware_soap properties in cib database. After I had > altered these numbers I didn't experience SIGTERM or SIGKILL any more. > However automatic fencing was still not successfull. > I don't understand why "manual fencing" by using command "fence_vmware_soap" > is working though and automatic with same parameters isn't. Because its not using the same parameters. Until 1.1.10-rc6, Pacemaker used a calculated value for port and action - regardless of what you specified. Look in "man stonithd" or the online docs for details on pcmk_host_map. You'd probably want "pcmk1:PCMK1;pcmk2:PCMK2;" Or just name the hosts in lowercase in vmware > > corosync.log attached further in the text shows there are some parsing > errors. I think this regards unusual characters used in the names of > the virtual machines which run on the ESX. This makes sense if unusual > character is used in the name of the fenced vmware machine. It isn't. > The corosyng.log shows names of other virtual machines on the ESX. > > Is it safe to say the issue occured within fence_vmware_soap resource > agent because it cannot handle something, maybe names of the virtual > machines? If so, I will try to update that agent. I am using version > fence-agents-3.1.5-17.el6.x86_64. > Is there a chance that changing timeout limits will help the > situation? I have feeling timeouts doesn't solve anything. It times > out because of something else. > > This is how the crm configuration looks now.... > [root@pcmk1 ~]# crm configure show > node pcmk1 > node pcmk2 > primitive drbd_pg ocf:linbit:drbd \ > params drbd_resource="postgres" \ > op monitor interval="15" role="Master" \ > op monitor interval="16" role="Slave" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="120" > primitive pg_fs ocf:heartbeat:Filesystem \ > params device="/dev/vg_local-lv_pgsql/lv_pgsql" > directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime" > fstype="xfs" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="120" > primitive pg_lsb lsb:postgresql-9.2 \ > op monitor interval="30" timeout="60" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" > primitive pg_lvm ocf:heartbeat:LVM \ > params volgrpname="vg_local-lv_pgsql" \ > op start interval="0" timeout="30" \ > op stop interval="0" timeout="30" > primitive pg_vip ocf:heartbeat:IPaddr2 \ > params ip="x.x.x.x" iflabel="tstcapsvip" \ > op monitor interval="5" > primitive vm-fence-pcmk1 stonith:fence_vmware_soap \ > params ipaddr="x.x.x.x" login="administrator" passwd="password" > port="PCMK1" ssl="1" retry_on="10" shell_timeout="120" login_timeout="120" > action="reboot" \ > op start interval="0" timeout="120" > primitive vm-fence-pcmk2 stonith:fence_vmware_soap \ > params ipaddr="x.x.x.x" login="administrator" passwd="password" > port="PCMK2" ssl="1" retry_on="10" shell_timeout="120" login_timeout="120" > action="reboot" \ > op start interval="0" timeout="120" > group PGServer pg_lvm pg_fs pg_lsb pg_vip \ > meta target-role="Started" > ms ms_drbd_pg drbd_pg \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2 > vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: > pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order > ord_pg inf: ms_drbd_pg:promote PGServer:start property > $id="cib-bootstrap-options" \ > dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="4" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" \ > maintenance-mode="false" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > > Command crm_verify -LV shows nothing. > [root@pcmk1 ~]# crm_verify -LV > > > [root@pcmk1 ~]# crm_mon -1 > ============ > Last updated: Thu Jul 18 14:23:15 2013 Last change: Thu Jul 18 > 14:20:54 2013 via crm_resource on pcmk1 > Stack: openais > Current DC: pcmk2 - partition WITHOUT quorum > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 > 2 Nodes configured, 4 expected votes > 8 Resources configured. > ============ > > Online: [ pcmk1 pcmk2 ] > > Resource Group: PGServer > pg_lvm (ocf::heartbeat:LVM): Started pcmk1 > pg_fs (ocf::heartbeat:Filesystem): Started pcmk1 > pg_lsb (lsb:postgresql-9.2): Started pcmk1 > pg_vip (ocf::heartbeat:IPaddr2): Started pcmk1 > Master/Slave Set: ms_drbd_pg [drbd_pg] > Masters: [ pcmk1 ] > Slaves: [ pcmk2 ] > vm-fence-pcmk1 (stonith:fence_vmware_soap): Started pcmk2 > vm-fence-pcmk2 (stonith:fence_vmware_soap): Started pcmk1 > > If I simulate split-brain by plugging out the cable from secondary > server pcmk2, /var/log/cluster/corosync.log on the primary server > pcmk1 tell this... > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: info: > can_fence_host_with_device: Refreshing port list for vm-fence-pcmk2 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [106.15],4222ac70-92c3-bddf-b524-24d848080cb2 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [107.25],42224003-b614-5eb2-f141-5437fc8319d8 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [107.29],4222719f-7bdc-84b2-4494-848a29c2bd5f > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ MEDI - WinXP with SP3 - MSDN > ],4222238c-c927-3af1-f2e7-e0dd374d373b > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (31 32): ],4222238c-c927-3af1-f2e7-e0dd374d373b > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ MEDI WIN7 32-bit - > MSDN],42223e4a-9541-2326-2a21-3b3532756b47 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 22): > [105.233],42220acd-6e21-4380-9b81-89d86f14317d > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (9 17): [106.21],42223377-1443-a44c-1dc0-815c2542898e > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (12 20): [106.29],4222394a-70f1-4612-6fcd-4525e13b0cc4 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ MEDI W2K8 R2 SP1 STD - MSDN > ],4222dc65-6752-b1b4-c0f7-38c94cd5609a > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (30 31): ],4222dc65-6752-b1b4-c0f7-38c94cd5609a > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (12 20): [106.52],4222aa80-0fe6-66c4-8d11-fea5f547b566 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [106.14],422249fc-a902-ba5c-deb0-e6db6198b984 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (18 25): [106.2],4222851c-1a9d-021a-4e16-9f8adc5bcc42 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (12 20): [106.28],422235ab-83c4-c0b7-812b-bc5b7019aff7 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [106.26],4222bbff-48eb-d60c-0347-430b8d72baa2 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [107.27],4222da62-3c55-37f8-f6b8-239657892914 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ MEDI WIN7 64-bit - MSDN > ],4222289e-0bd2-4280-c0f4-548fd42e7eab > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (26 27): ],4222289e-0bd2-4280-c0f4-548fd42e7eab > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (17 26): > [105.242],42228b51-4ef6-f9b8-b64a-882d68023074 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (20 29): > [105.230],42223dcd-22c1-a0f7-c629-5c4489e2c55d > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ W2K3 R2 ENT 32-bit ENG > ],4233c1c8-e0f9-26f3-b854-6376ec6b1d1c > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (25 26): ],4233c1c8-e0f9-26f3-b854-6376ec6b1d1c > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (9 17): [106.20],422285ba-6a31-0832-1b38-a910031cd057 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [106.27],4222d166-5647-79a3-d9d8-f90650b6188b > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (21 30): > [105.231],4222308c-41c7-02e9-3b20-c6df71838db9 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (25 28): !!! > [105.235],422283ac-c5d9-4bf1-96eb-a57d8d18c118 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (29 38): > [105.235],422283ac-c5d9-4bf1-96eb-a57d8d18c118 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (12 20): [106.13],42222137-0d67-ac9b-e3b6-11fb6d2c33e0 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (17 26): > [105.241],4222a40f-d91a-0e4f-2292-ef92c4836bb5 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (17 26): > [105.243],42222a9a-7440-6d19-b654-42c08a2abd69 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ MEDI W2K8 R2 SP1 ENT - MSDN > ],42227507-c4fd-c5aa-b7d7-4ececd284f84 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (30 31): ],42227507-c4fd-c5aa-b7d7-4ececd284f84 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (0 1): [ MEDI_gw_chckpnt > ],4222f42e-58c6-dc59-2a00-10041ad5ac08 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (18 19): ],4222f42e-58c6-dc59-2a00-10041ad5ac08 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 22): > [105.234],422295e3-644e-8b51-a373-e7f166b2fd5d > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 22): > [105.232],42228f9d-615f-1c3b-2158-d3ad08d40357 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (17 26): > [105.240],4222b273-68e7-379d-b874-6a47211e9449 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [107.28],4222cbc8-565d-eee1-4430-555b059663d0 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 22): > [105.236],4222115e-789a-66dd-95e9-786ec0d84ec0 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (13 21): [107.26],4222fb16-fadc-9031-8e3d-110225505a0f > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (12 20): [106.12],42226bf9-8e78-9356-773c-ecde31cf2fa2 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: warning: parse_host_line: > Could not parse (12 20): [106.51],4222ae99-f1d9-9811-d72b-10e875c58f56 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: info: > can_fence_host_with_device: vm-fence-pcmk2 can not fence pcmk2: > dynamic-list > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: info: stonith_command: > Processed st_query from pcmk1: rc=0 > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: error: remote_op_done: > Operation reboot of pcmk2 by <no-one> for > pcmk1[7496e5e6-4ab4-4028-b44d-c34c52a3fd04]: Operation timed out > Jul 18 14:31:00 [1498] pcmk1 crmd: info: tengine_stonith_callback: > StonithOp <remote-op state="0" st_target="pcmk2" st_op="reboot" /> > Jul 18 14:31:00 [1498] pcmk1 crmd: notice: tengine_stonith_callback: > Stonith operation 4 for pcmk2 failed (Operation timed out): aborting > transition. > Jul 18 14:31:00 [1498] pcmk1 crmd: info: abort_transition_graph: > tengine_stonith_callback:454 - Triggered transition abort (complete=0) : > Stonith failed > Jul 18 14:31:00 [1498] pcmk1 crmd: notice: tengine_stonith_notify: > Peer pcmk2 was not terminated (reboot) by <anyone> for pcmk1: > Operation timed out (ref=ca100580-8e00-49d4-b895-c538139a28dd) > Jul 18 14:31:00 [1498] pcmk1 crmd: notice: run_graph: ==== > Transition 2 (Complete=7, Pending=0, Fired=0, Skipped=4, Incomplete=5, > Source=/var/lib/pengine/pe-warn-34.bz2): Stopped > Jul 18 14:31:00 [1498] pcmk1 crmd: notice: do_state_transition: > State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ > input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] > Jul 18 14:31:00 [1497] pcmk1 pengine: notice: unpack_config: On loss > of CCM Quorum: Ignore > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: pe_fence_node: Node > pcmk2 will be fenced because it is un-expectedly down > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: determine_online_status: > Node pcmk2 is unclean > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: custom_action: Action > drbd_pg:1_stop_0 on pcmk2 is unrunnable (offline) > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: custom_action: Marking > node pcmk2 unclean > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: custom_action: Action > drbd_pg:1_stop_0 on pcmk2 is unrunnable (offline) > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: custom_action: Marking > node pcmk2 unclean > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: custom_action: Action > vm-fence-pcmk1_stop_0 on pcmk2 is unrunnable (offline) > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: custom_action: Marking > node pcmk2 unclean > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: stage6: Scheduling Node > pcmk2 for STONITH > Jul 18 14:31:00 [1497] pcmk1 pengine: notice: LogActions: Stop > drbd_pg:1 (pcmk2) > Jul 18 14:31:00 [1497] pcmk1 pengine: notice: LogActions: Stop > vm-fence-pcmk1 (pcmk2) > Jul 18 14:31:00 [1498] pcmk1 crmd: notice: do_state_transition: > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > Jul 18 14:31:00 [1498] pcmk1 crmd: info: do_te_invoke: > Processing graph 3 (ref=pe_calc-dc-1374150660-46) derived from > /var/lib/pengine/pe-warn-35.bz2 > Jul 18 14:31:00 [1498] pcmk1 crmd: info: te_rsc_command: > Initiating action 63: notify drbd_pg:0_pre_notify_stop_0 on pcmk1 > (local) Jul 18 14:31:00 pcmk1 lrmd: [1495]: info: rsc:drbd_pg:0:28: notify > Jul 18 14:31:00 [1498] pcmk1 crmd: notice: te_fence_node: > Executing reboot fencing operation (53) on pcmk2 (timeout=60000) > Jul 18 14:31:00 [1494] pcmk1 stonith-ng: info: > initiate_remote_stonith_op: Initiating remote operation reboot for > pcmk2: d69db4e3-7d3b-4bee-9bd5-aa7afb05c358 > Jul 18 14:31:00 [1497] pcmk1 pengine: warning: process_pe_message: > Transition 3: WARNINGs found during PE processing. PEngine Input stored in: > /var/lib/pengine/pe-warn-35.bz2 > Jul 18 14:31:00 [1497] pcmk1 pengine: notice: process_pe_message: > Configuration WARNINGs found during PE processing. Please run > "crm_verify -L" to identify issues. > Jul 18 14:31:01 [1498] pcmk1 crmd: info: process_lrm_event: > LRM operation drbd_pg:0_notify_0 (call=28, rc=0, cib-update=0, > confirmed=true) ok > > > Regards, > Michal Mistina > -----Original Message----- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Tuesday, July 16, 2013 5:23 AM > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1 > > > On 15/07/2013, at 8:56 PM, Mistina Michal <michal.mist...@virte.sk> wrote: > >> Hi Andrew. >> >> Here is the ommited /var/log/messages with stonigh-ng sections. >> >> Jul 15 09:53:38 PCMK1 stonith-ng[1538]: notice: stonith_device_action: >> Device vm-fence-pcmk2 not found >> Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_command: > Processed >> st_execute from lrmd: rc=-12 >> Jul 15 09:53:38 PCMK1 crmd[1542]: info: process_lrm_event: LRM > operation >> vm-fence-pcmk2_monitor_0 (call=11, rc=7, cib-update=21, >> confirmed=true) not running Jul 15 09:53:38 PCMK1 lrmd: [1539]: info: >> rsc:vm-fence-pcmk2:12: start >> Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_device_register: >> Added 'vm-fence-pcmk2' to the device list (1 active devices) >> Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_command: > Processed >> st_device_register from lrmd: rc=0 >> Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_command: > Processed >> st_execute from lrmd: rc=-1 >> Jul 15 09:54:13 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start >> process (PID >> 3332) timed out (try 1). Killing with signal SIGTERM (15). > > you took too long, go away > >> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start >> process (PID >> 3332) timed out (try 2). Killing with signal SIGKILL (9). > > seriously go away > >> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: operation start[12] on >> stonith::fence_vmware_soap::vm-fence-pcmk2 for client 1542, its > parameters: >> passwd=[password] shell_timeout=[20] ssl=[1] login=[administrator] >> action=[reboot] crm_feature_set=[3.0.6] retry_on=[10] >> ipaddr=[x.x.x.x] port=[T1-PCMK2] login_timeout=[15] >> CRM_meta_timeout=[20000] : pid [3332] timed out > > whatever that agent is doing, its taking to long or you've not given > it long enough > >> Jul 15 09:54:18 PCMK1 crmd[1542]: error: process_lrm_event: LRM > operation >> vm-fence-pcmk2_start_0 (12) Timed Out (timeout=20000ms) >> Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_ais_dispatch: Update >> relayed from pcmk2 >> Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_trigger_update: Sending >> flush op to all hosts for: fail-count-vm-fence-pcmk2 (INFINITY) >> Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_perform_update: Sent >> update 24: fail-count-vm-fence-pcmk2=INFINITY >> Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_ais_dispatch: Update >> relayed from pcmk2 >> Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_trigger_update: Sending >> flush op to all hosts for: last-failure-vm-fence-pcmk2 (1373874858) >> Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_perform_update: Sent >> update 27: last-failure-vm-fence-pcmk2=1373874858 >> Jul 15 09:54:21 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:13: stop >> Jul 15 09:54:21 PCMK1 stonith-ng[1538]: info: stonith_device_remove: >> Removed 'vm-fence-pcmk2' from the device list (0 active devices) >> Jul 15 09:54:21 PCMK1 stonith-ng[1538]: info: stonith_command: > Processed >> st_device_remove from lrmd: rc=0 >> Jul 15 09:54:21 PCMK1 crmd[1542]: info: process_lrm_event: LRM > operation >> vm-fence-pcmk2_stop_0 (call=13, rc=0, cib-update=23, confirmed=true) >> ok >> >> What does this output mean? >> >> Best regards, >> Michal Mistina >> >> -----Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Monday, July 15, 2013 3:06 AM >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1 >> >> >> On 13/07/2013, at 10:05 PM, Mistina Michal <michal.mist...@virte.sk> > wrote: >> >>> Hi, >>> Does somebody know how to set up fence_vmware_soap correctly so that >>> it >> will start fencing vmware machine in the esx 5.1? >>> >>> My problem is the fence_vmware_soap resource agent for stonith timed out. >> Don't know why. >> >> Nothing in the stonith-ng logs? >> >>> >>> [root@pcmk1 ~]# crm_verify -L -V >>> warning: unpack_rsc_op: Processing failed op >> vm-fence-pcmk2_last_failure_0 on pcmk1: unknown exec error (-2) >>> warning: unpack_rsc_op: Processing failed op >> vm-fence-pcmk1_last_failure_0 on pcmk2: unknown exec error (-2) >>> warning: common_apply_stickiness: Forcing vm-fence-pcmk2 away from >> pcmk1 after 1000000 failures (max=1000000) >>> warning: common_apply_stickiness: Forcing vm-fence-pcmk1 away from >> pcmk2 after 1000000 failures (max=1000000) >>> >>> I have 2 node cluster. If I tried to manually reboot vmware machine >>> by >> calling fence_vmware_soap it worked. >>> [root@pcmk1 ~]# fence_vmware_soap -a x.x.x.x -l administrator -p >>> password -n "pcmk2" -o reboot -z >>> >>> My settings are. >>> [root@pcmk1 ~]# stonith_admin -M -a fence_vmware_soap >>> <resource-agent name="fence_vmware_soap" shortdesc="Fence agent for >>> VMWare over SOAP API"> <longdesc>fence_vmware_soap is an I/O >>> Fencing agent which can be used >> with the virtual machines managed by VMWare products that have SOAP >> API v4.1+. >>> .P >>> Name of virtual machine (-n / port) has to be used in inventory path >> format (e.g. /datacenter/vm/Discovered virtual machine/myMachine). In >> the cases when name of yours VM is unique you can use it instead. >> Alternatively you can always use UUID (-U / uuid) to access virtual >> machine.</longdesc> >>> <vendor-url>http://www.vmware.com</vendor-url> >>> <parameters> >>> <parameter name="action" unique="0" required="1"> >>> <getopt mixed="-o, --action=<action>"/> >>> <content type="string" default="reboot"/> >>> <shortdesc lang="en">Fencing Action</shortdesc> >>> </parameter> >>> <parameter name="ipaddr" unique="0" required="1"> >>> <getopt mixed="-a, --ip=<ip>"/> >>> <content type="string"/> >>> <shortdesc lang="en">IP Address or Hostname</shortdesc> >>> </parameter> >>> <parameter name="login" unique="0" required="1"> >>> <getopt mixed="-l, --username=<name>"/> >>> <content type="string"/> >>> <shortdesc lang="en">Login Name</shortdesc> >>> </parameter> >>> <parameter name="passwd" unique="0" required="0"> >>> <getopt mixed="-p, --password=<password>"/> >>> <content type="string"/> >>> <shortdesc lang="en">Login password or passphrase</shortdesc> >>> </parameter> >>> <parameter name="passwd_script" unique="0" required="0"> >>> <getopt mixed="-S, --password-script=<script>"/> >>> <content type="string"/> >>> <shortdesc lang="en">Script to retrieve password</shortdesc> >>> </parameter> >>> <parameter name="ssl" unique="0" required="0"> >>> <getopt mixed="-z, --ssl"/> >>> <content type="boolean"/> >>> <shortdesc lang="en">SSL connection</shortdesc> >>> </parameter> >>> <parameter name="port" unique="0" required="0"> >>> <getopt mixed="-n, --plug=<id>"/> >>> <content type="string"/> >>> <shortdesc lang="en">Physical plug number or name of virtual >> machine</shortdesc> >>> </parameter> >>> <parameter name="uuid" unique="0" required="0"> >>> <getopt mixed="-U, --uuid"/> >>> <content type="string"/> >>> <shortdesc lang="en">The UUID of the virtual machine to >> fence.</shortdesc> >>> </parameter> >>> <parameter name="ipport" unique="0" required="0"> >>> <getopt mixed="-u, --ipport=<port>"/> >>> <content type="string"/> >>> <shortdesc lang="en">TCP port to use for connection with >> device</shortdesc> >>> </parameter> >>> <parameter name="verbose" unique="0" required="0"> >>> <getopt mixed="-v, --verbose"/> >>> <content type="boolean"/> >>> <shortdesc lang="en">Verbose mode</shortdesc> >>> </parameter> >>> <parameter name="debug" unique="0" required="0"> >>> <getopt mixed="-D, --debug-file=<debugfile>"/> >>> <content type="string"/> >>> <shortdesc lang="en">Write debug information to given >> file</shortdesc> >>> </parameter> >>> <parameter name="version" unique="0" required="0"> >>> <getopt mixed="-V, --version"/> >>> <content type="boolean"/> >>> <shortdesc lang="en">Display version information and >> exit</shortdesc> >>> </parameter> >>> <parameter name="help" unique="0" required="0"> >>> <getopt mixed="-h, --help"/> >>> <content type="boolean"/> >>> <shortdesc lang="en">Display help and exit</shortdesc> >>> </parameter> >>> <parameter name="separator" unique="0" required="0"> >>> <getopt mixed="-C, --separator=<char>"/> >>> <content type="string" default=","/> >>> <shortdesc lang="en">Separator for CSV created by operation >> list</shortdesc> >>> </parameter> >>> <parameter name="power_timeout" unique="0" required="0"> >>> <getopt mixed="--power-timeout"/> >>> <content type="string" default="20"/> >>> <shortdesc lang="en">Test X seconds for status change after >> ON/OFF</shortdesc> >>> </parameter> >>> <parameter name="shell_timeout" unique="0" required="0"> >>> <getopt mixed="--shell-timeout"/> >>> <content type="string" default="3"/> >>> <shortdesc lang="en">Wait X seconds for cmd prompt after issuing >> command</shortdesc> >>> </parameter> >>> <parameter name="login_timeout" unique="0" required="0"> >>> <getopt mixed="--login-timeout"/> >>> <content type="string" default="5"/> >>> <shortdesc lang="en">Wait X seconds for cmd prompt after >> login</shortdesc> >>> </parameter> >>> <parameter name="power_wait" unique="0" required="0"> >>> <getopt mixed="--power-wait"/> >>> <content type="string" default="0"/> >>> <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc> >>> </parameter> >>> <parameter name="delay" unique="0" required="0"> >>> <getopt mixed="--delay"/> >>> <content type="string" default="0"/> >>> <shortdesc lang="en">Wait X seconds before fencing is >> started</shortdesc> >>> </parameter> >>> <parameter name="retry_on" unique="0" required="0"> >>> <getopt mixed="--retry-on"/> >>> <content type="string" default="1"/> >>> <shortdesc lang="en">Count of attempts to retry power on</shortdesc> >>> </parameter> >>> </parameters> >>> <actions> >>> <action name="on"/> >>> <action name="off"/> >>> <action name="reboot"/> >>> <action name="status"/> >>> <action name="list"/> >>> <action name="monitor"/> >>> <action name="metadata"/> >>> <action name="stop" timeout="20s"/> >>> <action name="start" timeout="20s"/> </actions> </resource-agent> >>> >>> [root@pcmk1 ~]# crm configure show >>> node pcmk1 >>> node pcmk2 >>> primitive drbd_pg ocf:linbit:drbd \ >>> params drbd_resource="postgres" \ >>> op monitor interval="15" role="Master" \ >>> op monitor interval="16" role="Slave" \ >>> op start interval="0" timeout="240" \ >>> op stop interval="0" timeout="120" >>> primitive pg_fs ocf:heartbeat:Filesystem \ >>> params device="/dev/vg_local-lv_pgsql/lv_pgsql" >> directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime" >> fstype="xfs" \ >>> op start interval="0" timeout="60" \ >>> op stop interval="0" timeout="120" >>> primitive pg_lsb lsb:postgresql-9.2 \ >>> op monitor interval="30" timeout="60" \ >>> op start interval="0" timeout="60" \ >>> op stop interval="0" timeout="60" >>> primitive pg_lvm ocf:heartbeat:LVM \ >>> params volgrpname="vg_local-lv_pgsql" \ >>> op start interval="0" timeout="30" \ >>> op stop interval="0" timeout="30" >>> primitive pg_vip ocf:heartbeat:IPaddr2 \ >>> params ip="x.x.x.x" iflabel="pcmkvip" \ >>> op monitor interval="5" >>> primitive vm-fence-pcmk1 stonith:fence_vmware_soap \ >>> params ipaddr="x.x.x.x" login="administrator" passwd="password" >> port="pcmk1" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15" >> action="reboot" >>> primitive vm-fence-pcmk2 stonith:fence_vmware_soap \ >>> params ipaddr="x.x.x.x" login="administrator" passwd="password" >> port="pcmk2" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15" >> action="reboot" >>> group PGServer pg_lvm pg_fs pg_lsb pg_vip ms ms_drbd_pg drbd_pg \ >>> meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" >>> location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2 >>> vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: >>> pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order >>> ord_pg inf: ms_drbd_pg:promote PGServer:start property >>> $id="cib-bootstrap-options" \ >>> dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" >> \ >>> cluster-infrastructure="openais" \ >>> expected-quorum-votes="4" \ >>> stonith-enabled="true" \ >>> no-quorum-policy="ignore" \ >>> maintenance-mode="false" >>> rsc_defaults $id="rsc-options" \ >>> resource-stickiness="100" >>> >>> Am I doing something wrong? >>> >>> Best regards, >>> Michal Mistina >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org