On Thursday, February 29, 2024 5:54 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Feb 29, 2024 at 8:29 AM Peter Smith <smithpb2...@gmail.com> > wrote: > > > > On Wed, Feb 28, 2024 at 1:23 PM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> wrote: > > > > > > On Tuesday, February 27, 2024 3:18 PM Peter Smith > <smithpb2...@gmail.com> wrote: > > ... > > > > 20. > > > > +# > > > > +# | ----> standby1 (primary_slot_name = sb1_slot) # | ----> standby2 > > > > +(primary_slot_name = sb2_slot) # primary ----- | # | ----> subscriber1 > > > > +(failover = true) # | ----> subscriber2 (failover = false) > > > > > > > > In the diagram, the "--->" means a mixture of physical standbys and > logical > > > > pub/sub replication. Maybe it can be a bit clearer? > > > > > > > > SUGGESTION > > > > > > > > # primary (publisher) > > > > # > > > > # (physical standbys) > > > > # | ----> standby1 (primary_slot_name = sb1_slot) > > > > # | ----> standby2 (primary_slot_name = sb2_slot) > > > > # > > > > # (logical replication) > > > > # | ----> subscriber1 (failover = true, slot_name = lsub1_slot) > > > > # | ----> subscriber2 (failover = false, slot_name = lsub2_slot) > > > > > > > > > > I think one can distinguish it based on the 'standby' and 'subscriber' as > > > well, > because > > > 'standby' normally refer to physical standby while the other refer to > > > logical. > > > > > I think Peter's suggestion will make the setup clear.
Changed. > > > > > Ok, but shouldn't it at least include info about the logical slot > > names associated with the subscribers (slot_name = lsub1_slot, > > slot_name = lsub2_slot) like suggested above? > > > > ====== > > > > Here are some more review comments for v100-0001 > > > > ====== > > doc/src/sgml/config.sgml > > > > 1. > > + <para> > > + Lists the streaming replication standby server slot names that > logical > > + WAL sender processes will wait for. Logical WAL sender processes > will > > + send decoded changes to plugins only after the specified > replication > > + slots confirm receiving WAL. This guarantees that logical > > replication > > + slots with failover enabled do not consume changes until those > changes > > + are received and flushed to corresponding physical standbys. If a > > + logical replication connection is meant to switch to a physical > standby > > + after the standby is promoted, the physical replication slot for > > the > > + standby should be listed here. Note that logical replication will > > not > > + proceed if the slots specified in the standby_slot_names do > > not exist or > > + are invalidated. > > + </para> > > > > Is that note ("Note that logical replication will not proceed if the > > slots specified in the standby_slot_names do not exist or are > > invalidated") meant only for subscriptions marked for 'failover' or > > any subscription? Maybe wording can be modified to help clarify it? > > > > I think it is implicit that here we are talking about failover slots. > I think clarifying again the same could be repetitive considering the > previous sentence: "This guarantees that logical replication slots > with failover enabled do not consume .." have mentioned it. > > > ====== > > src/backend/replication/slot.c > > > > 2. > > +/* > > + * A helper function to validate slots specified in GUC standby_slot_names. > > + */ > > +static bool > > +validate_standby_slots(char **newval) > > +{ > > + char *rawname; > > + List *elemlist; > > + bool ok; > > + > > + /* Need a modifiable copy of string */ > > + rawname = pstrdup(*newval); > > + > > + /* Verify syntax and parse string into a list of identifiers */ > > + ok = SplitIdentifierString(rawname, ',', &elemlist); > > + > > + if (!ok) > > + { > > + GUC_check_errdetail("List syntax is invalid."); > > + } > > + > > + /* > > + * If the replication slots' data have been initialized, verify if the > > + * specified slots exist and are logical slots. > > + */ > > + else if (ReplicationSlotCtl) > > + { > > + foreach_ptr(char, name, elemlist) > > + { > > + ReplicationSlot *slot; > > + > > + slot = SearchNamedReplicationSlot(name, true); > > + > > + if (!slot) > > + { > > + GUC_check_errdetail("replication slot \"%s\" does not exist", > > + name); > > + ok = false; > > + break; > > + } > > + > > + if (!SlotIsPhysical(slot)) > > + { > > + GUC_check_errdetail("\"%s\" is not a physical replication slot", > > + name); > > + ok = false; > > + break; > > + } > > + } > > + } > > + > > + pfree(rawname); > > + list_free(elemlist); > > + return ok; > > +} > > > > 2a. > > I didn't mention this previously because I thought this function was > > not going to change anymore, but since Bertrand suggested some changes > > [1], I will say IMO the { } are fine here for the single statement, > > but I think it will be better to rearrange this code to be like below. > > Having a 2nd NOP 'else' gives a much better place where you can put > > your ReplicationSlotCtl comment. > > > > if (!ok) > > { > > GUC_check_errdetail("List syntax is invalid."); > > } > > else if (!ReplicationSlotCtl) > > { > > <Insert big comment here about why it is OK to skip when > > ReplicationSlotCtl is NULL> > > } > > else > > { > > foreach_ptr ... > > } > > > > +1. This will make the code and reasoning to skip clear. Changed. > > Few additional comments on the latest patch: > ================================= > 1. > static XLogRecPtr > WalSndWaitForWal(XLogRecPtr loc) > { > ... > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr && > + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) > + { > + /* > + * Fast path to avoid acquiring the spinlock in case we already know > + * we have enough WAL available and all the standby servers have > + * confirmed receipt of WAL up to RecentFlushPtr. This is particularly > + * interesting if we're far behind. > + */ > return RecentFlushPtr; > + } > ... > ... > + * Wait for WALs to be flushed to disk only if the postmaster has not > + * asked us to stop. > + */ > + if (loc > RecentFlushPtr && !got_STOPPING) > + wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL; > + > + /* > + * Check if the standby slots have caught up to the flushed position. > + * It is good to wait up to RecentFlushPtr and then let it send the > + * changes to logical subscribers one by one which are already covered > + * in RecentFlushPtr without needing to wait on every change for > + * standby confirmation. Note that after receiving the shutdown signal, > + * an ERROR is reported if any slots are dropped, invalidated, or > + * inactive. This measure is taken to prevent the walsender from > + * waiting indefinitely. > + */ > + else if (replication_active && > + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) > + { > + wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION; > + wait_for_standby = true; > + } > + else > + { > + /* Already caught up and doesn't need to wait for standby_slots. */ > break; > + } > ... > } > > Can we try to move these checks into a separate function that returns > a boolean and has an out parameter as wait_event? Refactored. > > 2. How about naming StandbyConfirmedFlush() as StandbySlotsAreCaughtup? I used a similar version based on the suggested name: StandbySlotsHaveCaughtup. Best Regards, Hou zj