[Pacemaker] Network outage debugging

Sean Lutner Tue, 12 Nov 2013 11:21:13 -0800

The folks testing the cluster I've been building have run a script which blocks 
all traffic except SSH on one node of the cluster for 15 seconds to mimic a 
network failure. During this time, the network being "down" seems to cause some 
odd behavior from pacemaker resulting in it dying.


The cluster is two nodes and running four custom resources on EC2 instances. 
The OS is CentOS 6.4 with the config below:

I've attached the /var/log/messages and /var/log/cluster/corosync.log from the 
time period during the test. I've having some difficulty in piecing together 
what happened and am hoping someone can shed some light on the problem. Any 
indications why pacemaker is dying on that node?


[root@ip-10-50-3-122 ~]# pcs config
Corosync Nodes:
 
Pacemaker Nodes:
 ip-10-50-3-122 ip-10-50-3-251 

Resources: 
 Resource: ClusterEIP_54.215.143.166 (provider=pacemaker type=EIP class=ocf)
  Attributes: first_network_interface_id=eni-e4e0b68c 
second_network_interface_id=eni-35f9af5d first_private_ip=10.50.3.191 
second_private_ip=10.50.3.91 eip=54.215.143.166 alloc_id=eipalloc-376c3c5f 
interval=5s 
  Operations: monitor interval=5s
 Clone: EIP-AND-VARNISH-clone
  Group: EIP-AND-VARNISH
   Resource: Varnish (provider=redhat type=varnish.sh class=ocf)
    Operations: monitor interval=5s
   Resource: Varnishlog (provider=redhat type=varnishlog.sh class=ocf)
    Operations: monitor interval=5s
   Resource: Varnishncsa (provider=redhat type=varnishncsa.sh class=ocf)
    Operations: monitor interval=5s
 Resource: ec2-fencing (type=fence_ec2 class=stonith)
  Attributes: ec2-home=/opt/ec2-api-tools pcmk_host_check=static-list 
pcmk_host_list=HA01 HA02 
  Operations: monitor start-delay=30s interval=0 timeout=150s

Location Constraints:
Ordering Constraints:
  ClusterEIP_54.215.143.166 then Varnish
  Varnish then Varnishlog
  Varnishlog then Varnishncsa
Colocation Constraints:
  Varnish with ClusterEIP_54.215.143.166
  Varnishlog with Varnish
  Varnishncsa with Varnishlog

Cluster Properties:
 dc-version: 1.1.8-7.el6-394e906
 cluster-infrastructure: cman
 last-lrm-refresh: 1384196963
 no-quorum-policy: ignore
 stonith-enabled: true

net-failure-messages-110913.out
Description: Binary data

net-failure-corosync-110913.out
Description: Binary data

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Network outage debugging

Reply via email to