On 13 Nov 2013, at 11:22 am, Sean Lutner <s...@rentul.net> wrote:

> 
> 
>> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof <and...@beekhof.net> wrote:
>> 
>> 
>>> On 13 Nov 2013, at 6:10 am, Sean Lutner <s...@rentul.net> wrote:
>>> 
>>> The folks testing the cluster I've been building have run a script which 
>>> blocks all traffic except SSH on one node of the cluster for 15 seconds to 
>>> mimic a network failure. During this time, the network being "down" seems 
>>> to cause some odd behavior from pacemaker resulting in it dying.
>>> 
>>> The cluster is two nodes and running four custom resources on EC2 
>>> instances. The OS is CentOS 6.4 with the config below:
>>> 
>>> I've attached the /var/log/messages and /var/log/cluster/corosync.log from 
>>> the time period during the test. I've having some difficulty in piecing 
>>> together what happened and am hoping someone can shed some light on the 
>>> problem. Any indications why pacemaker is dying on that node?
>> 
>> Because corosync is dying underneath it:
>> 
>> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:    error: send_ais_text:    
>> Sending message 28 via cpg: FAILED (rc=2): Library error: Connection timed 
>> out (110)
>> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:    error: 
>> pcmk_cpg_dispatch:    Connection to the CPG API failed: 2
>> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:    error: cib_ais_destroy:  
>>   Corosync connection lost!  Exiting.
>> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:     info: terminate_cib:    
>> cib_ais_destroy: Exiting fast...
> 
> Is that the expected behavior?

It is expected behaviour when corosync dies.  Ideally corosync wouldn't die 
though.

> Is it because the DC was the other node?

No.

> 
> I did notice that there was an attempted fence operation but it didn't look 
> successful. 
> 
>> 
>> 
>>> 
>>> 
>>> [root@ip-10-50-3-122 ~]# pcs config
>>> Corosync Nodes:
>>> 
>>> Pacemaker Nodes:
>>> ip-10-50-3-122 ip-10-50-3-251 
>>> 
>>> Resources: 
>>> Resource: ClusterEIP_54.215.143.166 (provider=pacemaker type=EIP class=ocf)
>>> Attributes: first_network_interface_id=eni-e4e0b68c 
>>> second_network_interface_id=eni-35f9af5d first_private_ip=10.50.3.191 
>>> second_private_ip=10.50.3.91 eip=54.215.143.166 alloc_id=eipalloc-376c3c5f 
>>> interval=5s 
>>> Operations: monitor interval=5s
>>> Clone: EIP-AND-VARNISH-clone
>>> Group: EIP-AND-VARNISH
>>> Resource: Varnish (provider=redhat type=varnish.sh class=ocf)
>>>  Operations: monitor interval=5s
>>> Resource: Varnishlog (provider=redhat type=varnishlog.sh class=ocf)
>>>  Operations: monitor interval=5s
>>> Resource: Varnishncsa (provider=redhat type=varnishncsa.sh class=ocf)
>>>  Operations: monitor interval=5s
>>> Resource: ec2-fencing (type=fence_ec2 class=stonith)
>>> Attributes: ec2-home=/opt/ec2-api-tools pcmk_host_check=static-list 
>>> pcmk_host_list=HA01 HA02 
>>> Operations: monitor start-delay=30s interval=0 timeout=150s
>>> 
>>> Location Constraints:
>>> Ordering Constraints:
>>> ClusterEIP_54.215.143.166 then Varnish
>>> Varnish then Varnishlog
>>> Varnishlog then Varnishncsa
>>> Colocation Constraints:
>>> Varnish with ClusterEIP_54.215.143.166
>>> Varnishlog with Varnish
>>> Varnishncsa with Varnishlog
>>> 
>>> Cluster Properties:
>>> dc-version: 1.1.8-7.el6-394e906
>>> cluster-infrastructure: cman
>>> last-lrm-refresh: 1384196963
>>> no-quorum-policy: ignore
>>> stonith-enabled: true
>>> 
>>> <net-failure-messages-110913.out><net-failure-corosync-110913.out>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to