Re: [Pacemaker] no failover if fencing device is unreachable (i.e. power loss)

Felix Schrage Mon, 18 Aug 2014 11:11:05 -0700

Thanks for the quick answer. I'll have a look at that.
Is there a way to manually force a failover when I can be sure the other 
machine is down?


Kind regards

Felix

-----Ursprüngliche Nachricht-----
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Montag, 18. August 2014 19:57
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] no failover if fencing device is unreachable (i.e. 
power loss)

On 18/08/14 01:50 PM, Felix Schrage wrote:
> Hi,
>
> I'am building a two-node cluster running XenServer, pacemaker and DRBD. 
> There's a problem when testing the failover by powering off the current 
> active node.
> When using the fence_xenapi agent, the resource ClusterIP will not be moved 
> to the 2nd node until the first node was successfully shut down.
> However  because the XenAPI is unreachable when the machine is powered off, 
> the 2nd node continuously is trying to shut down the node and the resource is 
> never moved.
>
> To check if it's an error with the fence_xenapi-agent I tried 
> fence_ipmilan which is working fine as long as the IPMI is is reachable. When 
> pulling the power cords from the machine however the behavior is the same as 
> with the fence_xenapi agent.
> Am I missing an option which should be set? A timeout or a retry counter?

This is the expected behaviour. Being unable to connect to the fence device (or 
to fail to confirm the "off" action) can not be treated as a successful fence. 
Without a successful fence, it can not be assumed that the peer is gone. To do 
so would be to risk a split-brain, so the cluster's only sane and safe option 
is to block.

For this reason, this is why we always use switched PDUs as a backup fence 
method. You can see how to configure this with STONITH levels:

http://clusterlabs.org/wiki/STONITH_Levels

> Here's how I setup the cluster (fence_xenapi) using pcs:
>
> pcs cluster cib ftp_ha_cluster
> pcs -f ftp_ha_cluster resource create ClusterIP IPaddr2 
> ip=172.20.150.150 cidr_netmask=32 op monitor interval=20s pcs -f 
> ftp_ha_cluster constraint location ClusterIP prefers ftp-test01=50 pcs 
> -f ftp_ha_cluster stonith create xenvm-fence-ftp1 fence_xenapi 
> pcmk_host_list="ftp-test01" action="off" 
> session_url="https://test-xen-01"; port="ftp-test01" login="root" 
> passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster 
> stonith create xenvm-fence-ftp2 fence_xenapi 
> pcmk_host_list="ftp-test02" action="off" 
> session_url="https://test-xen-02"; port="ftp-test02" login="root" 
> passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster 
> constraint location xenvm-fence-ftp1 prefers ftp-test01=-INFINITY pcs 
> -f ftp_ha_cluster constraint location xenvm-fence-ftp2 prefers 
> ftp-test02=-INFINITY pcs -f ftp_ha_cluster property set 
> stonith-enabled=true pcs -f ftp_ha_cluster property set 
> stonith-action=off pcs -f ftp_ha_cluster property set 
> stonith-timeout=40s pcs -f ftp_ha_cluster property set 
> no-quorum-policy=ignore pcs -f ftp_ha_cluster resource create Ping 
> ocf:pacemaker:ping dampen="5s" multiplier="100" 
> host_list="172.20.150.1 172.20.150.151 172.20.150.152" attempts="3" op 
> monitor interval=20s pcs -f ftp_ha_cluster resource clone Ping pcs -f 
> ftp_ha_cluster constraint location ClusterIP rule score=-INF 
> not_defined pingd or pingd lte 0 pcs -f ftp_ha_cluster constraint 
> location ClusterIP rule score=pingd defined pingd pcs cluster cib-push 
> ftp_ha_cluster
>
> for testing with fence_ipmilan I replaced the appropriate lines with the 
> following:
>
> pcs -f ftp_ha_cluster stonith create ipmi-fence-test-xen-01 
> fence_ipmilan pcmk_host_list="ftp-test01" action="off" 
> ipaddr="test-xen-01-bmc.mercateo.lan" auth="password" login="admin" 
> passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster 
> stonith create ipmi-fence-test-xen-02 fence_ipmilan 
> pcmk_host_list="ftp-test02" action="off" 
> ipaddr="test-xen-02-bmc.mercateo.lan" auth="password" login="admin" 
> passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster 
> constraint location ipmi-fence-test-xen-01 prefers 
> ftp-test01=-INFINITY pcs -f ftp_ha_cluster constraint location 
> ipmi-fence-test-xen-02 prefers ftp-test02=-INFINITY
>
>
> the content of /etc/corosync/corosync.conf:
>
> compatibility: whitetank
>
> totem {
>       version: 2
>       secauth: off
>       threads: 0
>       interface {
>               ringnumber: 0
>               bindnetaddr: 192.168.199.0
>               mcastaddr: 226.94.1.1
>               mcastport: 5405
>               ttl: 1
>       }
> }
>
> logging {
>       fileline: off
>       to_stderr: no
>       to_logfile: yes
>       to_syslog: no
>       logfile: /var/log/cluster/corosync.log
>       debug: off
>       timestamp: on
>       logger_subsys {
>               subsys: AMF
>               debug: off
>       }
> }
>
> amf {
>       mode: disabled
> }
>
> service {
>       ver:    1
>       name:   pacemaker
> }
>
> Any idea what could be missing/wrong?
>
> Kind regards,
>
> Felix
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is 
trapped in the mind of a person without access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] no failover if fencing device is unreachable (i.e. power loss)

Reply via email to