Dear Shveta, Thanks for updating the patch! Here is my comments for v52-0002.
~~~~~ system-views.sgml 01. ``` + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>sync_state</structfield> <type>char</type> + </para> + <para> + Defines slot synchronization state. This is meaningful on the physical + standby which has configured <xref linkend="guc-enable-syncslot"/> = true. + Possible values are: + <itemizedlist> + <listitem> + <para><literal>n</literal> = none for user created slots, ... ``` Hmm. I'm not sure why we must show a single character to a user. I'm OK for pg_subscription.srsubstate because it is a "catalog" - the actual value would be recorded in the heap. But pg_replication_slot is just a view so that we can replace internal representations to other strings. E.g., pg_replication_slots.wal_status. How about using {none, initialized, ready} or something? ~~~~~ postmaster.c 02. bgworker_should_start_now ``` + if (start_time == BgWorkerStart_ConsistentState_HotStandby && + pmState != PM_RUN) + return true; ``` I'm not sure the second condition is really needed. The line will be executed when pmState is PM_HOT_STANDBY. Is there a possibility that pmState is changed around here? ~~~~~ libpqwalreceiver.c 03. PQWalReceiverFunctions ``` + .walrcv_get_dbname_from_conninfo = libpqrcv_get_dbname_from_conninfo, ``` Just to confirm - is there a rule for ordering? ~~~~~ slotsync.c 04. SlotSyncWorkerCtx ``` typedef struct SlotSyncWorkerCtx { pid_t pid; slock_t mutex; } SlotSyncWorkerCtx; SlotSyncWorkerCtx *SlotSyncWorker = NULL; ``` Per other files like launcher.c, should we use a name like "SlotSyncWorkerCtxStruct"? 05. SlotSyncWorkerRegister() Your coding will work well, but there is another approach which validates slotsync parameters here. In this case, the postmaster should exit ASAP. This can notify that there are some wrong settings to users earlier. Thought? 06. wait_for_primary_slot_catchup ``` + CHECK_FOR_INTERRUPTS(); + + /* Handle any termination request if any */ + ProcessSlotSyncInterrupts(wrconn); ``` ProcessSlotSyncInterrupts() also has CHECK_FOR_INTERRUPTS(), so no need to call. 07. wait_for_primary_slot_catchup ``` + /* + * XXX: Is waiting for 2 seconds before retrying enough or more or + * less? + */ + rc = WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, + 2000L, + WAIT_EVENT_REPL_SLOTSYNC_PRIMARY_CATCHUP); + + ResetLatch(MyLatch); + + /* Emergency bailout if postmaster has died */ + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1); ``` Is there any reasons not to use WL_EXIT_ON_PM_DEATH event? If not, you can use. 08. synchronize_slots ``` + SpinLockAcquire(&WalRcv->mutex); + if (!WalRcv || + (WalRcv->slotname[0] == '\0') || + XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) + { ... ``` Assuming that WalRcv is still NULL. In this case, does the first SpinLockAcquire() lead a segmentation fault? 09. synchronize_slots ``` + elog(DEBUG2, "slot sync worker's query:%s \n", s.data); ``` The query is not dynamical one, so I think no need to print even if the debug mode. 10. synchronize_one_slot IIUC, this function can synchronize slots even if the used plugin on primary is not installed on the secondary server. If the slot is created by the slotsync worker, users will recognize it after the server is promoted and the decode is starting. I felt it is not good specification. Can we detect in the validation phase? ~~~~~ not the source code 11. I tested the typical case - promoting a publisher from a below diagram. A physical replication slot "physical" was specified as standby_slot_names. ``` node A (primary) --> node B (secondary) | | node C (subscriber) ``` And after the promoting, below lines were periodically output on logfiles for node B and C. ``` WARNING: replication slot "physical" specified in parameter "standby_slot_names" does not exist, ignoring ``` Do you have idea to suppress the warning? IIUC it is a normal behavior of the walsender so that we cannot avoid the periodical outputs. The steps of the test was as follows: 1. stop the node A via pg_ctl stop 2. promota the node B via pg_ctl promote 3. change the connection string of the subscription via ALTER SUBSCRIPTION ... CONNECTION ... Best Regards, Hayato Kuroda FUJITSU LIMITED