Just to tie this off.

It now seems stable since reinstalling vmware tools on both nodes. So it
seems nothing to do with corosync or pacemaker.

Regards,
Darren


On 7 February 2013 11:03, Darren Mansell <darren.mans...@gmail.com> wrote:

> Hi all.
>
> I've installed a Corosync/Pacemaker cluster of 2 nodes into a VMware ESX
> environment. The install uses Debian squeeze (6.0) with packages from
> squeeze-backports.
>
> These are package versions in use:
>
> corosync                            1.4.2-1~bpo60+1
> pacemaker                           1.1.7-1~bpo60+1
> ( + required packages and libs )
> ( I had to use backports to get the failure-timeout ability )
>
> I use these 2 nodes to run ldirectord and a VIP to load-balance a MS
> Exchange cluster and it works very well in the main. But about twice a day
> there are losses of quorum where the cluster will go split-brain then
> recover after about 30 seconds.
>
> I've already had to disable STONITH for this issue as it was causing long
> shoot-outs and taking a while to recover. Now with failure-timeouts and no
> STONITH it comes back fairly quickly.
>
> I've attached a hb_report from both nodes and put the cluster config
> below. Any ideas or thoughts would be most welcome.
>
> Many thanks.
> Darren
>
> crm configure show:
> node exlb01
> node exlb02
> primitive VIP1 ocf:heartbeat:IPaddr2 \
>         params lvs_support="true" ip="10.8.35.55" cidr_netmask="24"
> broadcast="10.8.35.255" \
>         op monitor interval="60" timeout="60" \
>         meta migration-threshold="2" failure-timeout="120"
> primitive ldirectord ocf:heartbeat:ldirectord \
>         params configfile="/etc/ha.d/ldirectord.cf" \
>         op monitor interval="60" timeout="60" \
>         meta migration-threshold="2" target-role="Started"
> failure-timeout="120"
> group lb VIP1 ldirectord \
>         meta target-role="Started"
> location l-lb-100 lb 100: exlb01
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="false" \
>         last-lrm-refresh="1355878292" \
>         cluster-recheck-interval="60s"
>
> crm status:
> ============
> Last updated: Thu Feb  7 11:01:06 2013
> Last change: Wed Dec 19 01:32:40 2012
> Stack: openais
> Current DC: exlb02 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ exlb02 exlb01 ]
>
>  Resource Group: lb
>      VIP1       (ocf::heartbeat:IPaddr2):       Started exlb01
>      ldirectord (ocf::heartbeat:ldirectord):    Started exlb01
>
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to