On Tue, May 31, 2022 at 7:36 PM Bruce Momjian <br...@momjian.us> wrote: > > On Wed, May 25, 2022 at 10:32:50PM -0400, Bruce Momjian wrote: > > On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote: > > > > > > It helps with setting up logical replication among two or more nodes > > > (data flows both ways) which is important for use cases where > > > applications are data-aware. For such apps, it will be beneficial to > > > > That does make sense, thanks. > > Uh, thinking some more, why would anyone set things up this way --- > having part of a table being primary on one server and a different part > of the table be a subscriber. Seems it would be simpler and safer to > create two child tables and have one be primary on only one server. > Users can access both tables using the parent. >
Yes, users can choose to do that way but still, to keep the nodes in sync and continuity of operations, it will be very difficult to manage the operations without the LRG APIs. Let us consider a simple two-node example where on each node there is Table T that has partitions P1 and P2. As far as I can understand, one needs to have the below kind of set-up to allow local operations on geographically distributed nodes. Node-1: node1 writes to P1 node1 publishes P1 node2 subscribes to P1 of node1 Node-2: node2 writes to P2 node2 publishes P2 node1 subscribes to P2 on node2 In this setup, we need to publish individual partitions, otherwise, we will face the loop problem where the data sent by node-1 to node-2 via logical replication will again come back to it causing problems like constraints violations, duplicate data, etc. There could be other ways to do this set up with current logical replication commands (for ex. publishing via root table) but that would require ways to avoid loops and could have other challenges. Now, in such a setup/scheme, consider a scenario (scenario-1), where node-2 went off (either it crashes, went out of network, just died, etc.) and comes up after some time. Now, one can either make the node-2 available by fixing the problem it has or can promote standby in that location (if any) to become master, both might require some time. In the meantime to continue the operations (which provides a seamless experience to users), users will be connected to node-1 to perform the required write operations. Now, to achieve this without LRG APIs, it will be quite complex for users to keep the data in sync. One needs to perform various steps to get the partition P2 data that went to node-1 till the time node-2 was not available. On node-1, it has to publish P2 changes for the time node-2 becomes available with the help of Create/Drop Publication APIs. And when node-2 comes back, it has to create a subscription for the above publication pub-2 to get that data, ensure both the nodes and in sync, and then allow operations on node-2. Not only this, but if there are more nodes in this set-up (say-10), it has to change (drop/create) subscriptions corresponding to partition P2 on all other nodes as each individual node is the owner of some partition. Another possibility is that the entire data center where node-2 was present was gone due to some unfortunate incident in which case they need to set up a new data center and hence a new node. Now, in such a case, the user needs to do all the steps mentioned in the previous scenario and additionally, it needs to ensure that it set up the node to sync all the existing data (of all partitions) before this node again starts receiving write changes for partition P2. I think all this should be relatively simpler with LRG APIs wherein for the second scenario user ideally just needs to use the lrg_attach* API and in the first scenario, it should automatically sync the missing data once the node-2 comes back. Now, the other important point that we should also consider for these LRG APIs is the ease of setup even in the normal case where we are just adding a new node as mentioned by Peter Smith in his email [1] (LRG makes setup easier). e.g. even if there are many nodes we only need a single lrg_attach by the joining node instead of needing N-1 subscriptions on all the existing nodes. [1] - https://www.postgresql.org/message-id/CAHut%2BPsvvfTWWwE8vkgUg4q%2BQLyoCyNE7NU%3DmEiYHcMcXciXdg%40mail.gmail.com -- With Regards, Amit Kapila.