On Wednesday, November 29, 2023 11:04 AM Peter Smith <smithpb2...@gmail.com> wrote:
Thanks for the comments. > ====== > 1. General. > > Previously (see [1] #0) I asked a question about if there is some > documentation > missing. Seems not yet answered. The document was add in V39-0002 in logicaldecoding.sgml because some necessary GUCs for slotsync are not in 0001. > > ====== > Commit message > > 2. > Users can set this flag during CREATE SUBSCRIPTION or during > pg_create_logical_replication_slot API. Examples: > > CREATE SUBSCRIPTION mysub CONNECTION '..' PUBLICATION mypub WITH > (failover = true); > > (failover is the last arg) > SELECT * FROM pg_create_logical_replication_slot('myslot', > 'pgoutput', false, true, true); > > ~ > > I felt it is better to say "Ex1" / "Ex2" (or "1" / "2" or something > similar) to indicate better where these examples start and finish, otherwise > they > just sort of get lost among the text. Changed. > > ====== > doc/src/sgml/catalogs.sgml > > 3. > From previous review ([1] #6) I suggested reordering fields. Hous-san > wrote: "but I think the functionality of two fields are different and I > didn’t find > the correlation between two fields except for the type of value." > > Yes, that is true. OTOH, I felt grouping the attributes by the same types made > the docs easier to read. The document's order should be same as the pg_subscription catalog, and I prefer not to move the new subfailoverstate in the middle of catalog as the functionality of them is different. > > ====== > src/backend/commands/subscriptioncmds.c > > 4. CreateSubscription > > + /* > + * If only the slot_name is specified (without create_slot option), > + * it is possible that the user intends to use an existing slot on > + * the publisher, so here we enable failover for the slot if > + * requested. > + */ > + else if (opts.slot_name && failover_enabled) { > > Unanswered question from previous review (see [1] #11a). i.e. How does this > condition ensure that *only* the slot name was specified (like the comment is > saying)? It is the else part of 'if (opts.create_slot)', so it means create_slot is not specified while only slot_name is specified. I have improved the comment. > > ~~~ > > 5. AlterSubscription > > errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed > when two_phase is enabled"), > errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh = false, > or with copy_data = false, or use DROP/CREATE SUBSCRIPTION."))); > > + if (sub->failoverstate == LOGICALREP_FAILOVER_STATE_ENABLED && > + opts.copy_data) ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed > when failover is enabled"), > + errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh = > false, or with copy_data = false, or use DROP/CREATE SUBSCRIPTION."))); > + > > There are translations issues same as reported in my previous review (see [1] > #12b and also several other places as noted in [1]). Hou-san replied that I > "can > start a separate thread to change the twophase related messages, and we can > change accordingly if it's accepted.", but that's not right IMO because it is > only > the fact that this sysncslot patch is reusing a similar message that warrants > the > need to extract a "common" message part in the first place. So I think it is > responsibility if this sycslot patch to make this change. OK, changed. > > ====== > src/backend/replication/logical/tablesync.c > > 6. process_syncing_tables_for_apply > > + if (MySubscription->twophasestate == > + LOGICALREP_TWOPHASE_STATE_PENDING) > + ereport(LOG, > + (errmsg("logical replication apply worker for subscription \"%s\" > will restart so that two_phase can be enabled", > + MySubscription->name))); > + > + if (MySubscription->failoverstate == > + LOGICALREP_FAILOVER_STATE_PENDING) > + ereport(LOG, > + (errmsg("logical replication apply worker for subscription \"%s\" > will restart so that failover can be enabled", > + MySubscription->name))); > > 6a. > You may end up log 2 restart messages for the same restart. Is it OK? I think it's OK as it can provide complete information. > > ~ > > 6b. > This is another example where you should share the same common message > (for less translations) I adjusted the message there. > > ====== > src/backend/replication/logical/worker.c > > 7. > + * The logical slot on the primary can be synced to the standby by > + specifying > + * the failover = true when creating the subscription. Enabling > + failover allows > + * us to smoothly transition to the standby in case the primary gets > + promoted, > + * ensuring that we can subscribe to the new primary without losing any data. > > /the failover = true/the failover = true option/ > > or > > /the failover = true/failover = true/ > Changed. > ~~~ > > 8. > + > #include "postgres.h" > > Unnecessary extra blank line Removed. > > ====== > src/backend/replication/slot.c > > 9. validate_standby_slots > > There was no reply to the comment in my previous review (see [1] #27). > Maybe you disagree or maybe accidentally overlooked? The error message has already been adjusted in V39. I adjusted the check in this version as well to be consistent. > > ~~~ > > 10. check_standby_slot_names > > In previous review I asked ([1] #28) why a special check was needed for "*". > Hou-san replied that "SplitIdentifierString() does not give error for '*' and > '*' > can be considered as valid value which if accepted can mislead user". > > Sure, but won't the code then just try to find if there is a replication slot > called > "*" and that will fail. That was my point, if the slot name lookup is going > to fail > anyway then why have the extra code for the special "*" case up-front? Note -- > I haven't tried it, so maybe code doesn't work like I think it does. I think allowing "*" can mislead user because it normally means every slot, but we don't want to support the "every slot" option as mentioned in the comment. So I think reject it here is fine. Reporting ERROR because the slot named '*' was not there may look confusing. > > ====== > src/backend/replication/walsender.c > > 11. PhysicalWakeupLogicalWalSnd > > No reply to my previous review comment ([1] #33). Not done? Disagreed, or > accidentally missed? The function mentioned in your previous comment has been removed in previous version, so I am not sure are you pointing to some other codes that has similar issues ? > > ~~~ > > 12. WalSndFilterStandbySlots > > + /* > + * If logical slot name is given in standby_slot_names, give WARNING > + * and skip it. Since it is harmless, so WARNING should be enough, no > + * need to error-out. > + */ > + else if (SlotIsLogical(slot)) > + warningfmt = _("cannot have logical replication slot \"%s\" in > parameter \"%s\", ignoring"); > > I previously raised an issue (see [1] #35) thinking this could not happen. > Hou-san explained how it might happen ("user could drop the logical slot and > recreate a physical slot with the same name without changing the GUC.") so > this > code was necessary. That is OK, but I think your same explanation in the code > commen. OK, I have added comments here. > > ~~~ > > 13. WalSndFilterStandbySlots > > + standby_slots_cpy = foreach_delete_current(standby_slots_cpy, lc); > > I previously raised issue (see [1] #36). Hou-san replied "I think it's OK to > remove > slots if it's invalidated, dropped, or was changed to logical one as we don't > need to wait for these slots to catch up anymore." > > Sure, maybe code is fine, but my point was that the code is removing elements > *more* scenarios than are mentioned by the function comment, so maybe > update that function comment for all the removal scenarios. Updated the comments. > > ~~~ > > 14. WalSndWaitForStandbyConfirmation > > The comment change from my previous review ([1] #37) not done. > Disagreed, or accidentally missed? Thanks for pointing, this was missed. > > ~~~ > > 15. WalSndWaitForStandbyConfirmation > > The question about calling ConditionVariablePrepareToSleep in my previous > review ([1] #39) not answered. Accidentally missed? I think V39 has already adjusted the order of reload and NIL check in this function. > > ~~~ > > 16. WalSndWaitForWal > > if (RecentFlushPtr != InvalidXLogRecPtr && > loc <= RecentFlushPtr) > - return RecentFlushPtr; > + { > + WalSndFilterStandbySlots(RecentFlushPtr, &standby_slots); > > It is better to use XLogRecPtrIsInvalid macro here. I know it was not strictly > added by your patch, but so much else changed nearby so I thought this should > be fixed at the same time. Changed. > > ====== > src/bin/pg_upgrade/info.c > > 17. get_old_cluster_logical_slot_infos > > + > slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * > num_slots); > > Excessive whitespace. Removed. Best Regards, Hou zj