On Wed, Jul 17, 2024 at 1:23 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > I also analyzed this failure, let me share it. Here, I think events in below > ordering were occurred. > > 1. Backend created a publication on $db2, > 2. BGWriter generated RUNNING_XACT record, then > 3. Backend created a replication slot on $db2. > > In this case, the recovery_target_lsn is ahead of the RUNNING_XACT record > generated > at step 3. Also, since both bgwriter and slot creation mark the record as > *UNIMPORTANT* one, the writer won't start again even after the > LOG_SNAPSHOT_INTERVAL_MS. The rule is written in BackgroundWriterMain(): > > ``` > /* > * Only log if enough time has passed and interesting > records have > * been inserted since the last snapshot. Have to > compare with <= > * instead of < because GetLastImportantRecPtr() > points at the > * start of a record, whereas last_snapshot_lsn > points just past > * the end of the record. > */ > if (now >= timeout && > last_snapshot_lsn <= GetLastImportantRecPtr()) > { > last_snapshot_lsn = LogStandbySnapshot(); > last_snapshot_ts = now; > } > ``` > > Therefore, pg_createsubscriber waited until a new record was replicated, but > no > activities were recorded, causing a timeout. Since this is a timing issue, > Alexander > could reproduce the failure with shorter time duration and parallel running. >
Your analysis sounds correct to me. > IIUC, the root cause is that pg_create_logical_replication_slot() returns a > LSN > which is not generated yet. So, I think both mine [1] and Euler's approach [2] > can solve the issue. My proposal was to add an extra WAL record after the > final > slot creation, and Euler's one was to use a restart_lsn as the > recovery_target_lsn. > I don't think it is correct to set restart_lsn as consistent_lsn point because the same is used to set replication origin progress. Later when we start the subscriber, the system will use that LSN as a start_decoding_at point which is the point after which all the commits will be replicated. So, we will end up incorrectly using restart_lsn (LSN from where we start reading the WAL) as start_decoding_at point. How could that be correct? Now, even if we use restart_lsn as recovery_target_lsn and the LSN returned by pg_create_logical_replication_slot() as consistent LSN to set replication progress, that also could lead to data loss because the subscriber may never get data between restart_lsn value and consistent LSN value. -- With Regards, Amit Kapila.