On Wed, Feb 11, 2015 at 09:10:45AM +0300, Andrei Borzenkov wrote: > В Tue, 10 Feb 2015 15:58:57 +0100 > Dejan Muhamedagic <deja...@fastmail.fm> пишет: > > > On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote: > > > On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote: > > > > Hi, > > > > > > > > On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote: > > > > > That is the problem that makes geo-clustering very hard to nearly > > > > > impossible. You can look at the Booth option for pacemaker, but that > > > > > requires two (or more) full clusters, plus an arbitrator 3rd > > > > > > > > A full cluster can consist of one node only. Hence, it is > > > > possible to have a kind of stretch two-node [multi-site] cluster > > > > based on tickets and managed by booth. > > > > > > In theory. > > > > > > In practice, we rely on "proper behaviour" of "the other site", > > > in case a ticket is revoked, or cannot be renewed. > > > > > > Relying on a single node for "proper behaviour" does not inspire > > > as much confidence as relying on a multi-node HA-cluster at each site, > > > which we can expect to ensure internal fencing. > > > > > > With reliable hardware watchdogs, it still should be ok to do > > > "stretched two node HA clusters" in a reliable way. > > > > > > Be generous with timeouts. > > > > As always. > > > > > And document which failure modes you expect to handle, > > > and how to deal with the worst-case scenarios if you end up with some > > > failure case that you are not equipped to handle properly. > > > > > > There are deployments which favor > > > "rather online with _potential_ split brain" over > > > "rather offline just in case". > > > > There's an arbitrator which should help in case of split brain. > > > > You can never really differentiate between site down and site cut off > due to (network) infrastructure outage. Arbitrator can mitigate split > brain only to the extent you trust your network. You still have to take > decision what you value more - data availability or data consistency.
Right, that's why I mentioned ticket loss policy. If booth drops the ticket, pacemaker would fence the node (if loss-policy=fence). Booth guarantees that no two sites will hold the ticket at the same time. Of course, you have to trust booth to function properly, but I guess that's a different story. Thanks, Dejan > Long distance clusters are really for disaster recovery. It is > convenient to have a single button that starts up all resources in > controlled manner, but someone really need to decide to push this > button. > > > > Document this, print it out on paper, > > > > > > "I am aware that this may lead to lost transactions, > > > data divergence, data corruption, or data loss. > > > I am personally willing to take the blame, > > > and live with the consequences." > > > > > > Have some "boss" sign that ^^^ > > > in the real world using a real pen. > > > > Well, of course running such a "stretch" cluster would be > > rather different from a "normal" one. > > > > The essential thing is that there's no fencing, unless configured > > as a dead-man switch for the ticket. Given that booth has a > > "sanity" program hook, maybe that could be utilized to verify if > > this side of the cluster is healthy enough. > > > > Thanks, > > > > Dejan > > > > > Lars > > > > > > -- > > > : Lars Ellenberg > > > : http://www.LINBIT.com | Your Way to High Availability > > > : DRBD, Linux-HA and Pacemaker support and consulting > > > > > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org