On Wed, Aug 20, 2025 at 2:25 PM vignesh C <vignes...@gmail.com> wrote: > > On Tue, 19 Aug 2025 at 23:33, Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > I imagined something like case 2. For logical replication of tables, > > if we support DDL replication (i.e., CREATE/ALTER/DROP TABLE), all > > changes the apply worker executes are serialized in commit LSN order. > > Therefore, users would not have to be concerned about schema changes > > that happened to the publisher. On the other hand, for sequence > > replication, even if we support DDL replication for sequences (i.e., > > CREATE/ALTER/DROP SEQUENCES), users would have to execute REFRESH > > PUBLICATION SEQUENCES command after "ALTER SEQUENCE s1 MAXVALUE 12;" > > has been replicated on the subscriber. Otherwise, REFRESH PUBLICATION > > SEQUENCE command would fail because the sequence parameters no longer > > match. > > I am summarizing the challenges identified so far (assuming we have > DDL replication implemented through WAL support) > 1) Lack of sequence-synchronization resulting in DDL replication > failure/conflict. > On the subscriber, the sequence has advanced to 14: > SELECT currval('s1'); > currval > --------- > 14 > > On the publisher, the sequence is reset to 11 and MAXVALUE is changed to 12: > SELECT setval('s1', 11); > ALTER SEQUENCE s1 MAXVALUE 12; > If the subscriber did not execute REFRESH PUBLICATION SEQUENCES, DDL > replication will fail with error. > ERROR: RESTART value (14) cannot be greater than MAXVALUE (12) > > 2) Manual DDL on subscriber resulting in sequence synchronization failure. > On the subscriber, the sequence maxvalue is changed: > ALTER SEQUENCE s1 MAXVALUE 12; > > On the publisher, the sequence has advanced to 14: > SELECT currval('s1'); > currval > --------- > 14 > > REFRESH PUBLICATION SEQUENCES will fail because setting currvalue to > 14 is greater than the changed maxvalue 12 in the subscriber. > > 3) Out of order DDL and REFRESH resulting in synchronization failure. > Initially we have the same sequence on pub and sub. Then lets say pub > has done parameter change: > ALTER SEQUENCE s1 MAXVALUE 12; > Now if this DDL is somehow not replicated on sub, REFRESH PUBLICATION > SEQUENCES will fail initially and may work once DDL is replicated. > ~~ > Problems 1 and 2 exist in both designs. While the WAL-based REFRESH > may seem slightly better for Problem 3 since REFRESH on the subscriber > will execute only after prior DDLs are replicated—even with the > sequence-sync worker, this isn't a major issue. If a user triggers > REFRESH before the DDL is replicated, the worker will refresh all > sequences except the mismatched one, and keep restarting and retrying > until the DDL is applied. Once that happens, the sequence sync > completes automatically, without the user doing another REFRESH. > Furthermore, the likelihood of a user executing REFRESH exactly during > the window between the DDL execution on the publisher and its > application on the subscriber seems relatively low. > > WAL-based approach OTOH introduces several additional challenges that > may outweigh its potential benefits: > 1) Increases load on WAL sender to collect sequence values. We are > talking about all the sequences here which could be huge in number. > 2) Table replication may stall until sequence conflicts are resolved. > The chances of hitting any conflict/error could be more here as > compared to tables specially when sequence synchronization is not > incremental and the number of sequences are huge. The continuous and > more frequent errors if not handled by users may even end up > invalidating the slot on primary. > > The worker approach neither blocks the apply worker in case of errors > nor adds extra load on the WAL sender. On its own, Case 3 doesn’t seem > significant enough to justify switching to a WAL-based design. > Overall, the worker-based approach appears to be less complex and a > better option. >
Agree on this. Please find a few comments on the previous patch: 1) + Returns information about the sequence. <literal>last_value</literal> + last sequence value set in sequence by nextval or setval, <literal>last_value</literal> indicates .... 2) + * If 'resync_all_sequences' is true: + * Perform the above operation only for sequences. Shall we update: Perform the above operation only for sequences and resync all the sequences including existing ones. <old comment, I think somehow missed to be addressed.> 3) + check_and_launch_sync_worker(nsyncworkers, InvalidOid, + &MyLogicalRepWorker->last_seqsync_start_time); Shall we simply name it as 'launch_sync_worker'. 'check' looks a little odd. All such functions (ex: logicalrep_worker_launch) has internal checks but the name need not to have 'check' keyword 4) + * Attempt to launch a sync worker (sequence or table) if there is a worker + * available and the retry interval has elapsed. shall we say: 'if there is a sync worker slot available' instead of 'if there is a worker available' 5) copy_sequences: + if (!sequence_rel || !HeapTupleIsValid(tup)) + { + elog(LOG, "skip synchronization of sequence \"%s.%s\" because it has been dropped concurrently", + seqinfo->nspname, seqinfo->seqname); + + batch_skipped_count++; + continue; + } Is it possible that sequence_rel is valid while tuple is not? If possible, then do we need table_close before continuing? 6) In copy_sequences() wherever we are using seqinfo->nspname, seqinfo->seqname; shall we directly use local vars nspname, seqname. 7) LogicalRepSyncSequences: + /* Skip if sequence was dropped concurrently */ + sequence_rel = try_table_open(subrel->srrelid, RowExclusiveLock); + if (!sequence_rel) + continue; Here we are not checking tuple-validity like we did in copy_sequences (comment 5 above). I think this alone should suffice even in copy_sequences(). What do you think? thanks Shveta