On Fri, Sep 8, 2023 at 4:40 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear Shveta, > > I resumed to check the thread. Here are my high-level comments. > Sorry if you have been already discussed.
Thanks Kuroda-san for the feedback. > > 01. General > > I think the documentation can be added, not only GUCs. How about adding > examples > for combinations of physical and logical replications? You can say that both > of > physical primary can be publisher and slots on primary/standby are > synchronized. > I did not fully understand this. Can you please state a clear example. We are only synchronizing logical replication slots in this draft and that too on physical standby from primary. So the last statement is not completely true. > 02. General > > standby_slot_names ensures that physical standby is always ahead subscriber, > but I > think it may be not sufficient. There is a possibility that primary server > does > not have any physical slots.So it expects a slot to be present. > In this case the physical standby may be behind the > subscriber and the system may be confused when the failover is occured. Currently there is a check in slot-sync worker which mandates that there is a physical slot present between primary and standby for this feature to proceed.So that confusion state will not arise. + /* WalRcvData is not set or primary_slot_name is not set yet */ + if (!WalRcv || WalRcv->slotname[0] == '\0') + return naptime; >Can't we specify the name of standby via application_name or something? So do you mean that in absence of a physical slot (if we plan to support that), we let primary know about standby(slots-synchronization client) through application_name? I am not sure about this. Will think more on this. I would like to know others' opinion on this as well. > > 03. General > > In this architecture, the syncslot worker is launched per db and they > independently connects to primary, right? Not completely true. Each slotsync worker is responsible for managing N dbs. Here 'N' = 'Number of distinct dbs for slots in synchronize_slot_names'/ 'number of max_slotsync_workers configured' for cases where dbcount exceeds workers configured. And if dbcount < max_slotsync_workers, then we launch only that many workers equal to dbcount and each worker manages a single db. Each worker independently connects to primary. Currently it makes a connection multiple times, I am optimizing it to make connection only once and then after each SIGHUP assuming 'primary_conninfo' may change. This change will be in the next version. >I'm not sure it is efficient, but I > come up with another architecture - only a worker (syncslot receiver)connects > to the primary and other workers (syncslot worker) receives infos from it and > updates. This can reduce the number of connections so that it may slightly > improve the latency of network. How do you think? > I feel it may help in reducing network latency, but not sure if it could be more efficient in keeping the lsns in sync. I feel it may introduce lag due to the fact that only one worker is getting all the info from primary and the actual synchronizing workers are waiting on that worker. This lag may be more when the number of slots are huge. We have run some performance tests on the design implemented currently, please have a look at emails around [1] and [2]. > 04. General > > test file recovery/t/051_slot_sync.pl is missing. > yes, it was removed. Please see point3 at [3] > 04. ReplSlotSyncMain > > Does the worker have to connect to the specific database? > > > ``` > /* Connect to our database. */ > BackgroundWorkerInitializeConnectionByOid(MySlotSyncWorker->dbid, > > MySlotSyncWorker->userid, > > 0); > ``` Since we are using libpq public interface 'walrcv_exec=libpqrcv_exec' to connect to primary, this needs database connection. It errors out in the absence of 'MyDatabaseId'. Do you think db-connection can have some downsides? > > 05. SlotSyncInitSlotNamesLst() > > "Lst" should be "List". > Okay, I will change this in the next version. ========== [1]: https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com [3]: https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com thanks Shveta