Hi Andreas, The problem is the network is out of my control. All the nodes are virtual machines over some VMWare ESX. We have two different networks, one for the service, and the other for the cluster. One idea is to create a second ring in the service network, but networks are virtualized, so maybe the problem persists.
And of course, we don't have stonith. It is the same problem, I have no control over the VMWare hosts, and seems that they have to pay an extra to use the API needed by the stonith plugin. Meanwhile, I try to find Probably this two problems will be fixed in a couple of months, but meanwhile I have try to maintain the cluster up :) Thanks Adrián On Mon, Aug 12, 2013 at 6:57 PM, Andreas Mock <andreas.m...@web.de> wrote: > Hi Adrián,**** > > ** ** > > IMHO the effort would focus on the wrong issue.**** > > Make your network for clustering reliable. It is THE building block**** > > of a cluster besides the nodes.**** > > - Additional network cards**** > > - Different vendor**** > > - Bonding**** > > - Different path through switches**** > > ** ** > > On a two-node-cluster without the necessary option to**** > > increase the number of nodes I almost always take a crosscable**** > > for one of the interconnects.**** > > ** ** > > Best regards**** > > Andreas Mock**** > > ** ** > > P.S. The story sounds to me that you also don't have stonith**** > > enabled. Another building block IMHO.**** > > ** ** > > ** ** > > *Von:* Adrián López Tejedor [mailto:adrian...@gmail.com] > *Gesendet:* Montag, 12. August 2013 16:26 > *An:* pacemaker@oss.clusterlabs.org > *Betreff:* [Pacemaker] New action for resource running in multiple nodes** > ** > > ** ** > > Hi!**** > > ** ** > > In the environment we use corosync/pacemaker, recently we are having some > problems with the network used to maintain the cluster. This short > interruptions cause the passive node (we have a two node active-passive > configuration with apache tomcat) to think he is alone, and start another > instance of tomcat. **** > > Few seconds later, the cluster reconnects, and the resource is found > active in both nodes. The default behaviour (as seen in > http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-options.html) > is to stop both, and start one of them.**** > > ** ** > > For us, this implies that service is down everytime a short interruption > in the network occurs.**** > > ** ** > > Maybe a new option for "multiple-active" like "stop_old" and/or "stop_new" > could be useful, stopping only the newest instance of the resource.**** > > ** ** > > Thanks!**** > > Adrián**** > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org