On Wednesday, June 20, 2012 03:02:28 PM Robert Haas wrote: > On Wed, Jun 20, 2012 at 5:15 AM, Andres Freund <and...@2ndquadrant.com> wrote: > > One bit is fine if you have only very simple replication topologies. Once > > you think about globally distributed databases its a bit different. You > > describe some of that below, but just to reiterate: > > Imagine having 6 nodes, 3 on one of two continents (ABC in north america, > > DEF in europe). You may only want to have full intercontinental > > interconnect between two of those (say A and D). If you only have one > > bit to represent the origin thats not going to work because you won't be > > able discern the changes from BC on A from the changes from those > > originating on DEF. > > I don't see the problem. A certainly knows via which link the LCRs > arrived.
> So: change happens on A. A sends the change to B, C, and D. B and C > apply the change. One bit is enough to keep them from regenerating > new LCRs that get sent back to A. So they're fine. D also receives > the changes (from A) and applies them, but it also does not need to > regenerate LCRs. Instead, it can take the LCRs that it has already > got (from A) and send those to E and F. > Or: change happens on B. B sends the changes to A. Since A knows the > network topology, it sends the changes to C and D. D sends them to E > and F. Nobody except B needs to *generate* LCRs. All any other node > needs to do is suppress *redundant* LCR generation. > > > Another topology which is interesting is circular replications (i.e. > > changes get shipped A->B, B->C, C->A) which is a sensible topology if > > you only have a low change rate and a relatively high number of nodes > > because you don't need the full combinatorial amount of connections. > > I think this one is OK too. You just generate LCRs on the origin node > and then pass them around the ring at every step. When the next hop > would be the origin node then you're done. > > I think you may be imagining that A generates LCRs and sends them to > B. B applies them, and then from the WAL just generated, it produces > new LCRs which then get sent to C. Yes, thats what I am proposing. > If you do that, then, yes, > everything that you need to disentangle various network topologies > must be present in WAL. But what I'm saying is: don't do it like > that. Generate the LCRs just ONCE, at the origin node, and then pass > them around the network, applying them at every node. Then, the > information that is needed in WAL is confined to one bit: the > knowledge of whether or not a particular transaction is local (and > thus LCRs should be generated) or non-local (and thus they shouldn't, > because the origin already generated them and thus we're just handing > them around to apply everywhere). Sure, you can do it that way, but I don't think its a good idea. If you do it my way you *guarantee* that when replaying changes from node B on node C you have replayed changes from A at least as far as B has. Thats a really nice property for MM. You *can* get same with your solution but it starts to get complicated rather fast. While my/our proposed solution is trivial to implement. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers