Please find a few comments on v6: 1) +/* + * Initialize logical decoding status on shmem at server startup. This + * must be called ONCE during postmaster or standalone-backend startup, + * before initializing replication slots. + */ +void +StartupLogicalDecodingStatus(bool last_status)
The comment says that it needs to be called 'before initializing replication slots' but instead it is called after initializing replication slots (i.e. after StartupReplicationSlots). Also, can you please help me understand the need of 'StartupLogicalDecodingStatus' when we are doing 'UpdateLogicalDecodingStatusEndOfRecovery' later in StartupXLOG. Why do we need to set last_status temporarily when the new status can be different which will be set in UpdateLogicalDecodingStatusEndOfRecovery 2) CreatePublication() has this: + errmsg("logical decoding needs to be enabled to publish logical changes"), + errhint("Set \"wal_level\" to \"logical\" or create a logical replication slot with \"replica\" \"wal_level\" before creating subscriptions."))); While rest of the places has this: + errhint("Set \"wal_level\" >= \"logical\" or create at least one logical slot on the primary."))); Shall we make these errhint consistent? Either all mention 'wal_level=replica' condition along with slot-creation part or none. 3) xlog_decode(): + case XLOG_LOGICAL_DECODING_STATUS_CHANGE: /* * This can occur only on a standby, as a primary would - * not allow to restart after changing wal_level < logical + * not allow to restart after changing wal_level < replica * if there is pre-existing logical slot. */ Assert(RecoveryInProgress()); ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("logical decoding on standby requires \"wal_level\" >= \"logical\" on the primary"))); + errmsg("logical decoding must be enabled on the primary"))); Is the comment correct? a) XLOG_LOGICAL_DECODING_STATUS_CHANGE can be result of logical-slot drop on primary and not necessarily making wal_level < replica b) I see that even standby does not allow to restart when changing wal_level < replica as against what comment says. Have I understood the intent correctly? standby LOG: FATAL: logical replication slot "failover_slot_st" exists, but "wal_level" < "replica" HINT: Change "wal_level" to be "replica" or higher. 4) start_logical_decoding_status_change(): + if (LogicalDecodingCtl->transition_in_progress) + { + LWLockRelease(LogicalDecodingControlLock); read_logical_decoding_status_transition() takes care of checking transition_in_progress, I think we missed to remove above from start_logical_decoding_status_change(). 5) + /* Return if we don't need to change the status */ + if (LogicalDecodingCtl->logical_decoding_enabled == new_status) + { Same with this code-logic in start_logical_decoding_status_change(), we shall remove it. 6) + * If we're in recovery and the startup process is still taking + * responsibility to update the status, we cannot change. + */ + if (!delay_status_change) + return false; + This comment is confusing as when in recovery, we can not change state otherwise as well even if delay_status_change is false. IIUC, the scenario can arise only during promotion, if so, shall we say: "If we're in recovery and a state transition (e.g., promotion) is in progress, wait for the transition to complete and retry on the new primary. Otherwise, disallow the status change entirely, as a standby cannot modify the logical decoding status." 7) The name 'delay_status_change' does not indicate which status or the intent of delay. More name options are: defer_logical_status_change, wait_for_recovery_transition/completion, recovery_transition_in_progress 8) DisableLogicalDecodingIfNecessary(): + + /* Write the WAL to disable logical decoding on standbys too */ + if (XLogStandbyInfoActive() && !recoveryInProgress) + { Do we need 'recoveryInProgress' check here? start_logical_decoding_status_change() has taken care of that. 9) Comments atop DisableLogicalDecodingIfNecessary: * This function expects to be called after dropping a possibly-last logical * replication slot. Logical decoding can be disabled only when wal_level is set * to 'replica' and there is no logical replication slot on the system. The comment is not completely true, shall we amend the comment to say something like: This function is called after a logical slot is dropped, but it only disables logical decoding on primary if it was the last remaining logical slot and wal_level < logical. Otherwise, it performs no action. 10) When we try to create or drop a logical slot on standby, and if delay_status_change is false, shall we immediately exit? Currently it does a lot of checks including CheckLogicalSlotExists() which can be completely avoided. I think it is worth having a quick 'RecoveryInProgress() && !delay_status_change' check in the beginning. thanks Shveta