On 15/07/2013, at 8:56 PM, Mistina Michal <michal.mist...@virte.sk> wrote:
> Hi Andrew. > > Here is the ommited /var/log/messages with stonigh-ng sections. > > Jul 15 09:53:38 PCMK1 stonith-ng[1538]: notice: stonith_device_action: > Device vm-fence-pcmk2 not found > Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_command: Processed > st_execute from lrmd: rc=-12 > Jul 15 09:53:38 PCMK1 crmd[1542]: info: process_lrm_event: LRM operation > vm-fence-pcmk2_monitor_0 (call=11, rc=7, cib-update=21, confirmed=true) not > running > Jul 15 09:53:38 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:12: start > Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_device_register: > Added 'vm-fence-pcmk2' to the device list (1 active devices) > Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_command: Processed > st_device_register from lrmd: rc=0 > Jul 15 09:53:38 PCMK1 stonith-ng[1538]: info: stonith_command: Processed > st_execute from lrmd: rc=-1 > Jul 15 09:54:13 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start process (PID > 3332) timed out (try 1). Killing with signal SIGTERM (15). you took too long, go away > Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start process (PID > 3332) timed out (try 2). Killing with signal SIGKILL (9). seriously go away > Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: operation start[12] on > stonith::fence_vmware_soap::vm-fence-pcmk2 for client 1542, its parameters: > passwd=[password] shell_timeout=[20] ssl=[1] login=[administrator] > action=[reboot] crm_feature_set=[3.0.6] retry_on=[10] ipaddr=[x.x.x.x] > port=[T1-PCMK2] login_timeout=[15] CRM_meta_timeout=[20000] : pid [3332] > timed out whatever that agent is doing, its taking to long or you've not given it long enough > Jul 15 09:54:18 PCMK1 crmd[1542]: error: process_lrm_event: LRM operation > vm-fence-pcmk2_start_0 (12) Timed Out (timeout=20000ms) > Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_ais_dispatch: Update > relayed from pcmk2 > Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_trigger_update: Sending > flush op to all hosts for: fail-count-vm-fence-pcmk2 (INFINITY) > Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_perform_update: Sent > update 24: fail-count-vm-fence-pcmk2=INFINITY > Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_ais_dispatch: Update > relayed from pcmk2 > Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_trigger_update: Sending > flush op to all hosts for: last-failure-vm-fence-pcmk2 (1373874858) > Jul 15 09:54:18 PCMK1 attrd[1540]: notice: attrd_perform_update: Sent > update 27: last-failure-vm-fence-pcmk2=1373874858 > Jul 15 09:54:21 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:13: stop > Jul 15 09:54:21 PCMK1 stonith-ng[1538]: info: stonith_device_remove: > Removed 'vm-fence-pcmk2' from the device list (0 active devices) > Jul 15 09:54:21 PCMK1 stonith-ng[1538]: info: stonith_command: Processed > st_device_remove from lrmd: rc=0 > Jul 15 09:54:21 PCMK1 crmd[1542]: info: process_lrm_event: LRM operation > vm-fence-pcmk2_stop_0 (call=13, rc=0, cib-update=23, confirmed=true) ok > > What does this output mean? > > Best regards, > Michal Mistina > > -----Original Message----- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Monday, July 15, 2013 3:06 AM > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1 > > > On 13/07/2013, at 10:05 PM, Mistina Michal <michal.mist...@virte.sk> wrote: > >> Hi, >> Does somebody know how to set up fence_vmware_soap correctly so that it > will start fencing vmware machine in the esx 5.1? >> >> My problem is the fence_vmware_soap resource agent for stonith timed out. > Don't know why. > > Nothing in the stonith-ng logs? > >> >> [root@pcmk1 ~]# crm_verify -L -V >> warning: unpack_rsc_op: Processing failed op > vm-fence-pcmk2_last_failure_0 on pcmk1: unknown exec error (-2) >> warning: unpack_rsc_op: Processing failed op > vm-fence-pcmk1_last_failure_0 on pcmk2: unknown exec error (-2) >> warning: common_apply_stickiness: Forcing vm-fence-pcmk2 away from > pcmk1 after 1000000 failures (max=1000000) >> warning: common_apply_stickiness: Forcing vm-fence-pcmk1 away from > pcmk2 after 1000000 failures (max=1000000) >> >> I have 2 node cluster. If I tried to manually reboot vmware machine by > calling fence_vmware_soap it worked. >> [root@pcmk1 ~]# fence_vmware_soap -a x.x.x.x -l administrator -p >> password -n "pcmk2" -o reboot -z >> >> My settings are. >> [root@pcmk1 ~]# stonith_admin -M -a fence_vmware_soap <resource-agent >> name="fence_vmware_soap" shortdesc="Fence agent for VMWare over SOAP API"> >> <longdesc>fence_vmware_soap is an I/O Fencing agent which can be used > with the virtual machines managed by VMWare products that have SOAP API > v4.1+. >> .P >> Name of virtual machine (-n / port) has to be used in inventory path > format (e.g. /datacenter/vm/Discovered virtual machine/myMachine). In the > cases when name of yours VM is unique you can use it instead. Alternatively > you can always use UUID (-U / uuid) to access virtual machine.</longdesc> >> <vendor-url>http://www.vmware.com</vendor-url> >> <parameters> >> <parameter name="action" unique="0" required="1"> >> <getopt mixed="-o, --action=<action>"/> >> <content type="string" default="reboot"/> >> <shortdesc lang="en">Fencing Action</shortdesc> >> </parameter> >> <parameter name="ipaddr" unique="0" required="1"> >> <getopt mixed="-a, --ip=<ip>"/> >> <content type="string"/> >> <shortdesc lang="en">IP Address or Hostname</shortdesc> >> </parameter> >> <parameter name="login" unique="0" required="1"> >> <getopt mixed="-l, --username=<name>"/> >> <content type="string"/> >> <shortdesc lang="en">Login Name</shortdesc> >> </parameter> >> <parameter name="passwd" unique="0" required="0"> >> <getopt mixed="-p, --password=<password>"/> >> <content type="string"/> >> <shortdesc lang="en">Login password or passphrase</shortdesc> >> </parameter> >> <parameter name="passwd_script" unique="0" required="0"> >> <getopt mixed="-S, --password-script=<script>"/> >> <content type="string"/> >> <shortdesc lang="en">Script to retrieve password</shortdesc> >> </parameter> >> <parameter name="ssl" unique="0" required="0"> >> <getopt mixed="-z, --ssl"/> >> <content type="boolean"/> >> <shortdesc lang="en">SSL connection</shortdesc> >> </parameter> >> <parameter name="port" unique="0" required="0"> >> <getopt mixed="-n, --plug=<id>"/> >> <content type="string"/> >> <shortdesc lang="en">Physical plug number or name of virtual > machine</shortdesc> >> </parameter> >> <parameter name="uuid" unique="0" required="0"> >> <getopt mixed="-U, --uuid"/> >> <content type="string"/> >> <shortdesc lang="en">The UUID of the virtual machine to > fence.</shortdesc> >> </parameter> >> <parameter name="ipport" unique="0" required="0"> >> <getopt mixed="-u, --ipport=<port>"/> >> <content type="string"/> >> <shortdesc lang="en">TCP port to use for connection with > device</shortdesc> >> </parameter> >> <parameter name="verbose" unique="0" required="0"> >> <getopt mixed="-v, --verbose"/> >> <content type="boolean"/> >> <shortdesc lang="en">Verbose mode</shortdesc> >> </parameter> >> <parameter name="debug" unique="0" required="0"> >> <getopt mixed="-D, --debug-file=<debugfile>"/> >> <content type="string"/> >> <shortdesc lang="en">Write debug information to given > file</shortdesc> >> </parameter> >> <parameter name="version" unique="0" required="0"> >> <getopt mixed="-V, --version"/> >> <content type="boolean"/> >> <shortdesc lang="en">Display version information and > exit</shortdesc> >> </parameter> >> <parameter name="help" unique="0" required="0"> >> <getopt mixed="-h, --help"/> >> <content type="boolean"/> >> <shortdesc lang="en">Display help and exit</shortdesc> >> </parameter> >> <parameter name="separator" unique="0" required="0"> >> <getopt mixed="-C, --separator=<char>"/> >> <content type="string" default=","/> >> <shortdesc lang="en">Separator for CSV created by operation > list</shortdesc> >> </parameter> >> <parameter name="power_timeout" unique="0" required="0"> >> <getopt mixed="--power-timeout"/> >> <content type="string" default="20"/> >> <shortdesc lang="en">Test X seconds for status change after > ON/OFF</shortdesc> >> </parameter> >> <parameter name="shell_timeout" unique="0" required="0"> >> <getopt mixed="--shell-timeout"/> >> <content type="string" default="3"/> >> <shortdesc lang="en">Wait X seconds for cmd prompt after issuing > command</shortdesc> >> </parameter> >> <parameter name="login_timeout" unique="0" required="0"> >> <getopt mixed="--login-timeout"/> >> <content type="string" default="5"/> >> <shortdesc lang="en">Wait X seconds for cmd prompt after > login</shortdesc> >> </parameter> >> <parameter name="power_wait" unique="0" required="0"> >> <getopt mixed="--power-wait"/> >> <content type="string" default="0"/> >> <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc> >> </parameter> >> <parameter name="delay" unique="0" required="0"> >> <getopt mixed="--delay"/> >> <content type="string" default="0"/> >> <shortdesc lang="en">Wait X seconds before fencing is > started</shortdesc> >> </parameter> >> <parameter name="retry_on" unique="0" required="0"> >> <getopt mixed="--retry-on"/> >> <content type="string" default="1"/> >> <shortdesc lang="en">Count of attempts to retry power on</shortdesc> >> </parameter> >> </parameters> >> <actions> >> <action name="on"/> >> <action name="off"/> >> <action name="reboot"/> >> <action name="status"/> >> <action name="list"/> >> <action name="monitor"/> >> <action name="metadata"/> >> <action name="stop" timeout="20s"/> >> <action name="start" timeout="20s"/> >> </actions> >> </resource-agent> >> >> [root@pcmk1 ~]# crm configure show >> node pcmk1 >> node pcmk2 >> primitive drbd_pg ocf:linbit:drbd \ >> params drbd_resource="postgres" \ >> op monitor interval="15" role="Master" \ >> op monitor interval="16" role="Slave" \ >> op start interval="0" timeout="240" \ >> op stop interval="0" timeout="120" >> primitive pg_fs ocf:heartbeat:Filesystem \ >> params device="/dev/vg_local-lv_pgsql/lv_pgsql" > directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime" > fstype="xfs" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="120" >> primitive pg_lsb lsb:postgresql-9.2 \ >> op monitor interval="30" timeout="60" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> primitive pg_lvm ocf:heartbeat:LVM \ >> params volgrpname="vg_local-lv_pgsql" \ >> op start interval="0" timeout="30" \ >> op stop interval="0" timeout="30" >> primitive pg_vip ocf:heartbeat:IPaddr2 \ >> params ip="x.x.x.x" iflabel="pcmkvip" \ >> op monitor interval="5" >> primitive vm-fence-pcmk1 stonith:fence_vmware_soap \ >> params ipaddr="x.x.x.x" login="administrator" passwd="password" > port="pcmk1" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15" > action="reboot" >> primitive vm-fence-pcmk2 stonith:fence_vmware_soap \ >> params ipaddr="x.x.x.x" login="administrator" passwd="password" > port="pcmk2" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15" > action="reboot" >> group PGServer pg_lvm pg_fs pg_lsb pg_vip ms ms_drbd_pg drbd_pg \ >> meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" >> location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2 >> vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: >> pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order >> ord_pg inf: ms_drbd_pg:promote PGServer:start property >> $id="cib-bootstrap-options" \ >> dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" > \ >> cluster-infrastructure="openais" \ >> expected-quorum-votes="4" \ >> stonith-enabled="true" \ >> no-quorum-policy="ignore" \ >> maintenance-mode="false" >> rsc_defaults $id="rsc-options" \ >> resource-stickiness="100" >> >> Am I doing something wrong? >> >> Best regards, >> Michal Mistina >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org