Please find a few comments on v6:

1)
+/*
+ * Initialize logical decoding status on shmem at server startup. This
+ * must be called ONCE during postmaster or standalone-backend startup,
+ * before initializing replication slots.
+ */
+void
+StartupLogicalDecodingStatus(bool last_status)

The comment says that it needs to be called 'before initializing
replication slots' but instead it is called after initializing
replication slots (i.e. after StartupReplicationSlots).

Also, can you please help me understand the need of
'StartupLogicalDecodingStatus' when we are doing
'UpdateLogicalDecodingStatusEndOfRecovery' later in StartupXLOG. Why
do we need to set last_status temporarily when the new status can be
different which will be set in
UpdateLogicalDecodingStatusEndOfRecovery


2)
CreatePublication() has this:

+ errmsg("logical decoding needs to be enabled to publish logical changes"),
+ errhint("Set \"wal_level\" to \"logical\" or create a logical
replication slot with \"replica\" \"wal_level\" before creating
subscriptions.")));

While rest of the places has this:

+ errhint("Set \"wal_level\" >= \"logical\" or create at least one
logical slot on the primary.")));

Shall we make these errhint consistent?  Either all mention
'wal_level=replica' condition along with slot-creation part or none.


3)
xlog_decode():

+ case XLOG_LOGICAL_DECODING_STATUS_CHANGE:
  /*
  * This can occur only on a standby, as a primary would
- * not allow to restart after changing wal_level < logical
+ * not allow to restart after changing wal_level < replica
  * if there is pre-existing logical slot.
  */
  Assert(RecoveryInProgress());
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("logical decoding on standby requires \"wal_level\" >=
\"logical\" on the primary")));
+ errmsg("logical decoding must be enabled on the primary")));

Is the comment correct?
a)
XLOG_LOGICAL_DECODING_STATUS_CHANGE can be result of logical-slot drop
on primary and not necessarily making wal_level < replica

b)
I see that even standby does not allow to restart when changing
wal_level < replica as against what comment says. Have I understood
the intent correctly?

standby LOG:
FATAL:  logical replication slot "failover_slot_st" exists, but
"wal_level" < "replica"
HINT:  Change "wal_level" to be "replica" or higher.


4)
start_logical_decoding_status_change():
+ if (LogicalDecodingCtl->transition_in_progress)
+ {
+ LWLockRelease(LogicalDecodingControlLock);

read_logical_decoding_status_transition() takes care of checking
transition_in_progress, I think we missed to remove above from
start_logical_decoding_status_change().


5)
+ /* Return if we don't need to change the status */
+ if (LogicalDecodingCtl->logical_decoding_enabled == new_status)
+ {

Same with this code-logic in start_logical_decoding_status_change(),
we shall remove it.

6)
+ * If we're in recovery and the startup process is still taking
+ * responsibility to update the status, we cannot change.
+ */
+ if (!delay_status_change)
+ return false;
+

This comment is confusing as when in recovery, we can not change state
otherwise as well even if delay_status_change is false. IIUC, the
scenario can arise only during promotion, if so, shall we say:

"If we're in recovery and a state transition (e.g., promotion) is in
progress, wait for the transition to complete and retry on the new
primary. Otherwise, disallow the status change entirely, as a standby
cannot modify the logical decoding status."

7)
The name 'delay_status_change' does not indicate which status or the
intent of delay. More name options are: defer_logical_status_change,
wait_for_recovery_transition/completion,
recovery_transition_in_progress

8)
DisableLogicalDecodingIfNecessary():
+
+ /* Write the WAL to disable logical decoding on standbys too */
+ if (XLogStandbyInfoActive() && !recoveryInProgress)
+ {

Do we need 'recoveryInProgress' check here?
start_logical_decoding_status_change() has taken care of that.

9)
Comments atop DisableLogicalDecodingIfNecessary:

 * This function expects to be called after dropping a possibly-last logical
 * replication slot. Logical decoding can be disabled only when wal_level is set
 * to 'replica' and there is no logical replication slot on the system.

The comment is not completely true, shall we amend the comment to say
something like:

This function is called after a logical slot is dropped, but it only
disables logical decoding on primary if it was the last remaining
logical slot and wal_level < logical. Otherwise, it performs no
action.

10)
When we try to create or drop a logical slot on standby, and if
delay_status_change is false, shall we immediately exit? Currently it
does a lot of checks including CheckLogicalSlotExists() which can be
completely avoided. I think it is worth having a quick
'RecoveryInProgress() && !delay_status_change' check in the beginning.

thanks
Shveta


Reply via email to