Yes there is;

stonith_admin --confirm=<dead node>

I know you will confirm this, but it needs to be stated how critical it is that you really have confirmed the node is off.

digimer

On 18/08/14 02:01 PM, Felix Schrage wrote:
Thanks for the quick answer. I'll have a look at that.
Is there a way to manually force a failover when I can be sure the other 
machine is down?

Kind regards

Felix

-----Ursprüngliche Nachricht-----
Von: Digimer [mailto:li...@alteeve.ca]
Gesendet: Montag, 18. August 2014 19:57
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] no failover if fencing device is unreachable (i.e. 
power loss)

On 18/08/14 01:50 PM, Felix Schrage wrote:
Hi,

I'am building a two-node cluster running XenServer, pacemaker and DRBD. There's 
a problem when testing the failover by powering off the current active node.
When using the fence_xenapi agent, the resource ClusterIP will not be moved to 
the 2nd node until the first node was successfully shut down.
However  because the XenAPI is unreachable when the machine is powered off, the 
2nd node continuously is trying to shut down the node and the resource is never 
moved.

To check if it's an error with the fence_xenapi-agent I tried
fence_ipmilan which is working fine as long as the IPMI is is reachable. When 
pulling the power cords from the machine however the behavior is the same as 
with the fence_xenapi agent.
Am I missing an option which should be set? A timeout or a retry counter?

This is the expected behaviour. Being unable to connect to the fence device (or to fail 
to confirm the "off" action) can not be treated as a successful fence. Without 
a successful fence, it can not be assumed that the peer is gone. To do so would be to 
risk a split-brain, so the cluster's only sane and safe option is to block.

For this reason, this is why we always use switched PDUs as a backup fence 
method. You can see how to configure this with STONITH levels:

http://clusterlabs.org/wiki/STONITH_Levels

Here's how I setup the cluster (fence_xenapi) using pcs:

pcs cluster cib ftp_ha_cluster
pcs -f ftp_ha_cluster resource create ClusterIP IPaddr2
ip=172.20.150.150 cidr_netmask=32 op monitor interval=20s pcs -f
ftp_ha_cluster constraint location ClusterIP prefers ftp-test01=50 pcs
-f ftp_ha_cluster stonith create xenvm-fence-ftp1 fence_xenapi
pcmk_host_list="ftp-test01" action="off"
session_url="https://test-xen-01"; port="ftp-test01" login="root"
passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster
stonith create xenvm-fence-ftp2 fence_xenapi
pcmk_host_list="ftp-test02" action="off"
session_url="https://test-xen-02"; port="ftp-test02" login="root"
passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster
constraint location xenvm-fence-ftp1 prefers ftp-test01=-INFINITY pcs
-f ftp_ha_cluster constraint location xenvm-fence-ftp2 prefers
ftp-test02=-INFINITY pcs -f ftp_ha_cluster property set
stonith-enabled=true pcs -f ftp_ha_cluster property set
stonith-action=off pcs -f ftp_ha_cluster property set
stonith-timeout=40s pcs -f ftp_ha_cluster property set
no-quorum-policy=ignore pcs -f ftp_ha_cluster resource create Ping
ocf:pacemaker:ping dampen="5s" multiplier="100"
host_list="172.20.150.1 172.20.150.151 172.20.150.152" attempts="3" op
monitor interval=20s pcs -f ftp_ha_cluster resource clone Ping pcs -f
ftp_ha_cluster constraint location ClusterIP rule score=-INF
not_defined pingd or pingd lte 0 pcs -f ftp_ha_cluster constraint
location ClusterIP rule score=pingd defined pingd pcs cluster cib-push
ftp_ha_cluster

for testing with fence_ipmilan I replaced the appropriate lines with the 
following:

pcs -f ftp_ha_cluster stonith create ipmi-fence-test-xen-01
fence_ipmilan pcmk_host_list="ftp-test01" action="off"
ipaddr="test-xen-01-bmc.mercateo.lan" auth="password" login="admin"
passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster
stonith create ipmi-fence-test-xen-02 fence_ipmilan
pcmk_host_list="ftp-test02" action="off"
ipaddr="test-xen-02-bmc.mercateo.lan" auth="password" login="admin"
passwd="****" delay=15 op monitor interval=40s pcs -f ftp_ha_cluster
constraint location ipmi-fence-test-xen-01 prefers
ftp-test01=-INFINITY pcs -f ftp_ha_cluster constraint location
ipmi-fence-test-xen-02 prefers ftp-test02=-INFINITY


the content of /etc/corosync/corosync.conf:

compatibility: whitetank

totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.199.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
                ttl: 1
        }
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

amf {
        mode: disabled
}

service {
        ver:    1
        name:   pacemaker
}

Any idea what could be missing/wrong?

Kind regards,

Felix

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is 
trapped in the mind of a person without access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to