On Mon, 12 Aug 2013 19:27:33 +0200 Adrián López Tejedor <adrian...@gmail.com> wrote: > The problem is the network is out of my control. All the nodes are > virtual machines over some VMWare ESX. > We have two different networks, one for the service, and the other > for the cluster. > One idea is to create a second ring in the service network, but > networks are virtualized, so maybe the problem persists. > > And of course, we don't have stonith. It is the same problem, I have > no control over the VMWare hosts, and seems that they have to pay an > extra to use the API needed by the stonith plugin. > > Meanwhile, I try to find > > Probably this two problems will be fixed in a couple of months, but > meanwhile I have try to maintain the cluster up :)
Sounds a bit as if the hardware-layer has bridging with stp featuring some seconds of unavailability. You could try to increase the corosyncs parameters concerning communication timeouts. Good luck, Arnold
signature.asc
Description: PGP signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org