On Wednesday, September 11, 2024 12:18 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Tue, Sep 10, 2024 at 4:30 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> > wrote: > > > > On Tuesday, September 10, 2024 5:56 PM shveta malik > <shveta.ma...@gmail.com> wrote: > > > > > > Thanks for the example. Can you please review below and let me know > > > if my understanding is correct. > > > > > > 1) > > > In a bidirectional replication setup, the user has to create slots > > > in a way that NodeA's sub's slot is Node B's feedback_slot and Node > > > B's sub's slot is Node A's feedback slot. And then only this feature will > work well, is it correct to say? > > > > Yes, your understanding is correct. > > > > > > > > 2) > > > Now coming back to multiple feedback_slots in a subscription, is the > > > below > > > correct: > > > > > > Say Node A has publications and subscriptions as follow: > > > ------------------ > > > A_pub1 > > > > > > A_sub1 (subscribing to B_pub1 with the default slot_name of A_sub1) > > > A_sub2 (subscribing to B_pub2 with the default slot_name of A_sub2) > > > A_sub3 (subscribing to B_pub3 with the default slot_name of A_sub3) > > > > > > > > > Say Node B has publications and subscriptions as follow: > > > ------------------ > > > B_sub1 (subscribing to A_pub1 with the default slot_name of B_sub1) > > > > > > B_pub1 > > > B_pub2 > > > B_pub3 > > > > > > Then what will be the feedback_slot configuration for all > > > subscriptions of A and B? Is below correct: > > > ------------------ > > > A_sub1, A_sub2, A_sub3: feedback_slots=B_sub1 > > > B_sub1: feedback_slots=A_sub1,A_sub2, A_sub3 > > > > Right. The above configurations are correct. > > Okay. It seems difficult to understand configuration from user's perspective.
Right. I think we could give an example in the document to make it clear. > > > > > > > 3) > > > If the above is true, then do we have a way to make sure that the > > > user has given this configuration exactly the above way? If users > > > end up giving feedback_slots as some random slot (say A_slot4 or > > > incomplete list), do we validate that? (I have not looked at code > > > yet, just trying to understand design first). > > > > The patch doesn't validate if the feedback slots belong to the correct > > subscriptions on remote server. It only validates if the slot is an > > existing, valid, logical slot. I think there are few challenges to validate > > it > further. > > E.g. We need a way to identify the which server the slot is > > replicating changes to, which could be tricky as the slot currently > > doesn't have any info to identify the remote server. Besides, the slot > > could be inactive temporarily due to some subscriber side error, in > > which case we cannot verify the subscription that used it. > > Okay, I understand the challenges here. > > > > > > > 4) > > > Now coming to this: > > > > > > > The apply worker will get the oldest confirmed flush LSN among the > > > > specified slots and send the LSN as a feedback message to the > > > > walsender. > > > > > > There will be one apply worker on B which will be due to B_sub1, so > > > will it check confirmed_lsn of all slots A_sub1,A_sub2, A_sub3? > > > Won't it be sufficient to check confimed_lsn of say slot A_sub1 > > > alone which has subscribed to table 't' on which delete has been > > > performed? Rest of the lots (A_sub2, A_sub3) might have subscribed to > different tables? > > > > I think it's theoretically correct to only check the A_sub1. We could > > document that user can do this by identifying the tables that each > > subscription replicates, but it may not be user friendly. > > > > Sorry, I fail to understand how user can identify the tables and give > feedback_slots accordingly? I thought feedback_slots is a one time > configuration when replication is setup (or say setup changes in future); it > can > not keep on changing with each query. Or am I missing something? I meant that user have all the publication information(including the tables added in a publication) that the subscription subscribes to, and could also have the slot_name, so I think it's possible to identify the tables that each subscription includes and add the feedback_slots correspondingly before starting the replication. It would be pretty complicate although possible, so I prefer to not mention it in the first place if it could not bring much benefits. > > IMO, it is something which should be identified internally. Since the query > is on > table 't1', feedback-slot which is for 't1' shall be used to check lsn. But on > rethinking,this optimization may not be worth the effort, the identification > part > could be tricky, so it might be okay to check all the slots. I agree that identifying these internally would add complexity. > > ~~ > > Another query is about 3 node setup. I couldn't figure out what would be > feedback_slots setting when it is not bidirectional, as in consider the case > where there are three nodes A,B,C. Node C is subscribing to both Node A and > Node B. Node A and Node B are the ones doing concurrent "update" and > "delete" which will both be replicated to Node C. In this case what will be > the > feedback_slots setting on Node C? We don't have any slots here which will be > replicating changes from Node C to Node A and Node C to Node B. This is given > in [3] in your first email ([1]) Thanks for pointing this, the link was a bit misleading. I think the solution proposed in this thread is only used to allow detecting update_deleted reliably in a bidirectional cluster. For non- bidirectional cases, it would be more tricky to predict the timing till when should we retain the dead tuples. > > [1]: > https://www.postgresql.org/message-id/OS0PR01MB5716BE80DAEB0EE2A > 6A5D1F5949D2%40OS0PR01MB5716.jpnprd01.prod.outlook.com Best Regards, Hou zj