On Tuesday, January 9, 2024 9:17 AM Peter Smith <smithpb2...@gmail.com> wrote: > > Here are some review comments for patch v57-0001.
Thanks for the comments! > > ====== > doc/src/sgml/protocol.sgml > > 1. CREATE_REPLICATION_SLOT ... FAILOVER > > + <varlistentry> > + <term><literal>FAILOVER [ <replaceable > class="parameter">boolean</replaceable> ]</literal></term> > + <listitem> > + <para> > + If true, the slot is enabled to be synced to the physical > + standbys so that logical replication can be resumed after failover. > + </para> > + </listitem> > + </varlistentry> > > This syntax says passing the boolean value is optional. So the default needs > to > the specified here in the docs (like what the TWO_PHASE option does). > > ~~~ > > 2. ALTER_REPLICATION_SLOT ... FAILOVER > > + <variablelist> > + <varlistentry> > + <term><literal>FAILOVER [ <replaceable > class="parameter">boolean</replaceable> ]</literal></term> > + <listitem> > + <para> > + If true, the slot is enabled to be synced to the physical > + standbys so that logical replication can be resumed after failover. > + </para> > + </listitem> > + </varlistentry> > + </variablelist> > > This syntax says passing the boolean value is optional. So it needs to be > specified here in the docs that not passing a value would be the same as > passing the value true. The behavior that "not passing a value would be the same as passing the value true " is due to the rule of defGetBoolean(). And all the options of commands in this document behave the same in this case, therefore I think we'd better add document for it in a general place in a separate patch/thread instead of mentioning this in each option's paragraph. > > ====== > doc/src/sgml/ref/alter_subscription.sgml > > 3. > + If <link > linkend="sql-createsubscription-params-with-failover"><literal>failover</lit > eral></link> > + is enabled, you can temporarily disable it in order to execute > these commands. > > /in order to/to/ This part has been removed due to design change. > > ~~~ > > 4. > + <para> > + When altering the > + <link > linkend="sql-createsubscription-params-with-slot-name"><literal>slot_nam > e</literal></link>, > + the <literal>failover</literal> property of the new slot may > differ from the > + <link > linkend="sql-createsubscription-params-with-failover"><literal>failover</lit > eral></link> > + parameter specified in the subscription. When creating the slot, > + ensure the slot failover property matches the > + <link > linkend="sql-createsubscription-params-with-failover"><literal>failover</lit > eral></link> > + parameter value of the subscription. > + </para> > > 4a. > the <literal>failover</literal> property of the new slot may differ > > Maybe it would be more clear if that said "the failover property value of the > named slot...". Changed. > > ~ > > 4b. > In the "failover property matches" part should that failover also be rendered > as > <literal> like before in the same paragraph? Added. > > ====== > doc/src/sgml/system-views.sgml > > 5. > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>failover</structfield> <type>bool</type> > + </para> > + <para> > + True if this logical slot is enabled to be synced to the physical > + standbys so that logical replication can be resumed from the new > primary > + after failover. Always false for physical slots. > + </para></entry> > + </row> > > /True if this logical slot is enabled.../True if this is a logical slot > enabled.../ Changed. > > ====== > src/backend/commands/subscriptioncmds.c > > 6. CreateSubscription > > + /* > + * Even if failover is set, don't create the slot with failover > + * enabled. Will enable it once all the tables are synced and > + * ready. The intention is that if failover happens at the time of > + * table-sync, user should re-launch the subscription instead of > + * relying on main slot (if synced) with no table-sync data > + * present. When the subscription has no tables, leave failover as > + * false to allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to > + * work. > + */ > + if (opts.failover && !opts.copy_data && tables != NIL) > + failover_enabled = true; > > AFAICT it might be possible for this to set failover_enabled = true if > copy_data > is false. So failover_enabled would be true when later > calling: > walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled, > failover_enabled, CRS_NOEXPORT_SNAPSHOT, NULL); > > Isn't that contrary to what this comment said: "Even if failover is set, don't > create the slot with failover enabled" This part has been removed due to design change. > > ~~~ > > 7. AlterSubscription. case ALTER_SUBSCRIPTION_OPTIONS: > > + if (!sub->slotname) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot set failover for subscription that does not have slot > + name"))); > > /for subscription that does not have slot name/for a subscription that does > not > have a slot name/ Changed. > > ====== > .../libpqwalreceiver/libpqwalreceiver.c > > 8. > + if (PQresultStatus(res) != PGRES_COMMAND_OK) ereport(ERROR, > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > + errmsg("could not alter replication slot \"%s\"", slotname))); > > This used to display the error message like > pchomp(PQerrorMessage(conn->streamConn)) but it was removed. Is it OK? I added this back. > > ====== > src/backend/replication/logical/tablesync.c > > 9. > + if (MySubscription->twophasestate == > + LOGICALREP_TWOPHASE_STATE_PENDING) > + ereport(LOG, > + /* translator: %s is a subscription option */ (errmsg("logical > + replication apply worker for subscription \"%s\" > will restart so that %s can be enabled", > + MySubscription->name, "two_phase"))); > + > + if (MySubscription->failoverstate == > + LOGICALREP_FAILOVER_STATE_PENDING) > + ereport(LOG, > + /* translator: %s is a subscription option */ (errmsg("logical > + replication apply worker for subscription \"%s\" > will restart so that %s can be enabled", > + MySubscription->name, "failover"))); > > Those errors have multiple %s, so the translator's comment should say "the > 2nd %s is a..." This part has been removed due to design change. > > ~~~ > > 10. > void > -UpdateTwoPhaseState(Oid suboid, char new_state) > +EnableTwoPhaseFailoverTriState(Oid suboid, bool enable_twophase, > + bool enable_failover) > > I felt the function name was a bit confusing. Maybe it is simpler to call it > like > "EnableTriState" or "EnableSubTriState" -- the parameters anyway specify what > actual state(s) will be set. This part has been removed due to design change. > > ====== > src/backend/replication/logical/worker.c > > 11. > + /* Update twophase and/or failover */ > + EnableTwoPhaseFailoverTriState(MySubscription->oid, twophase_pending, > + failover_pending); > + if (twophase_pending) > + MySubscription->twophasestate = > LOGICALREP_TWOPHASE_STATE_ENABLED; > + > + if (failover_pending) > + MySubscription->failoverstate = LOGICALREP_FAILOVER_STATE_ENABLED; > > Can't you pass the MySubscription as a parameter and then the > EnableTwoPhaseFailoverTriState can also set these > LOGICALREP_TWOPHASE_STATE_ENABLED/LOGICALREP_FAILOVER_STATE_EN > ABLED > states within the Enable* function? This part has been removed due to design change. > > ====== > src/backend/replication/repl_gram.y > > 12. > %token K_CREATE_REPLICATION_SLOT > %token K_DROP_REPLICATION_SLOT > +%token K_ALTER_REPLICATION_SLOT > > and > > + create_replication_slot drop_replication_slot alter_replication_slot > + identify_system read_replication_slot timeline_history show > + upload_manifest > > and > > | create_replication_slot > | drop_replication_slot > + | alter_replication_slot > > and > > | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; } > | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; } > + | K_ALTER_REPLICATION_SLOT { $$ = "alter_replication_slot"; } > > etc. > > ~ > > Although it makes no difference IMO it is more natural to code everything in > the order: create, alter, drop. > > ====== > src/backend/replication/repl_scanner.l > > 13. > CREATE_REPLICATION_SLOT { return K_CREATE_REPLICATION_SLOT; } > DROP_REPLICATION_SLOT { return K_DROP_REPLICATION_SLOT; } > +ALTER_REPLICATION_SLOT { return K_ALTER_REPLICATION_SLOT; } > > and > > case K_CREATE_REPLICATION_SLOT: > case K_DROP_REPLICATION_SLOT: > + case K_ALTER_REPLICATION_SLOT: > > Although it makes no difference IMO it is more natural to code everything in > the order: create, alter, drop. Personally, I am not sure if it looks better, so I didn’t change this. > > ====== > src/backend/replication/slot.c > > 14. > + if (SlotIsPhysical(MyReplicationSlot)) > + ereport(ERROR, > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot use %s with a physical replication slot", > + "ALTER_REPLICATION_SLOT")); > > /with a/for a/ This is to be consistent with another error message, so I didn’t change this. errmsg("cannot use %s with a logical replication slot", "READ_REPLICATION_SLOT")); > > ====== > src/backend/replication/walsender.c > > 15. > +static void > +ParseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover) > +{ > + ListCell *lc; > + bool failover_given = false; > + > + /* Parse options */ > + foreach(lc, cmd->options) > + { > + DefElem *defel = (DefElem *) lfirst(lc); > > AFAIK there are some new-style macros now you can use for this code. > e.g. foreach_ptr? See [1]. Changed. > > ~~~ > > 16. > + if (strcmp(defel->defname, "failover") == 0) { if (failover_given) > + ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("conflicting or > + redundant options"))); failover_given = true; *failover = > + defGetBoolean(defel); } > > The documented syntax showed that passing the boolean value for the > FAILOVER option is not mandatory. Does this code work if the boolean value is > not passed? It works, defGetBoolean will handle this case. > > ====== > src/bin/psql/tab-complete.c > > 17. > I think "ALTER SUBSCRIPTION ... SET (failover)" is possible, but the ALTER > SUBSCRIPTION tab completion code is missing. Added. > ====== > .../t/050_standby_failover_slots_sync.pl > > 19. > + > +# Copyright (c) 2023, PostgreSQL Global Development Group > + > > /2023/2024/ Changed. > > ~~~ > > 20. > +# Create another subscription (using the same slot created above) that > +enables # failover. > +$subscriber1->safe_psql( > + 'postgres', qq[ > + CREATE TABLE tab_int (a int PRIMARY KEY); CREATE SUBSCRIPTION > +regress_mysub1 CONNECTION '$publisher_connstr' > PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data=false, > failover = true, create_slot = false); > > The comment should not say "Create another subscription" because this is the > first subscription being created. > > /another/a/ Changed. > > ~~~ > > 21. > +################################################## > +# Test if changing the failover property of a subscription updates the > +# corresponding failover property of the slot. > +################################################## > > /Test if/Test that/ Changed. > > ====== > src/test/regress/sql/subscription.sql > > 22. > +CREATE SUBSCRIPTION regress_testsub CONNECTION > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, > failover = true); > + > +\dRs+ > > This is currently only testing the explicit "failover=true". > > Maybe you can also test the other kinds work as expected: > - explicit "SET (failover=false)" > - explicit "SET (failover)" with no value specified I think these tests don't add enough value to catch future bugs, so I prefer not to add these. Best Regards, Hou zj