On Wed, Aug 20, 2025 at 2:25 PM vignesh C <vignes...@gmail.com> wrote:
>
> On Tue, 19 Aug 2025 at 23:33, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > I imagined something like case 2. For logical replication of tables,
> > if we support DDL replication (i.e., CREATE/ALTER/DROP TABLE), all
> > changes the apply worker executes are serialized in commit LSN order.
> > Therefore, users would not have to be concerned about schema changes
> > that happened to the publisher. On the other hand, for sequence
> > replication, even if we support DDL replication for sequences (i.e.,
> > CREATE/ALTER/DROP SEQUENCES), users would have to execute REFRESH
> > PUBLICATION SEQUENCES command after "ALTER SEQUENCE s1 MAXVALUE 12;"
> > has been replicated on the subscriber. Otherwise, REFRESH PUBLICATION
> > SEQUENCE command would fail because the sequence parameters no longer
> > match.
>
> I am summarizing the challenges identified so far (assuming we have
> DDL replication implemented through WAL support)
> 1) Lack of sequence-synchronization resulting in DDL replication
> failure/conflict.
> On the subscriber, the sequence has advanced to 14:
> SELECT currval('s1');
> currval
> ---------
>      14
>
> On the publisher, the sequence is reset to 11 and MAXVALUE is changed to 12:
> SELECT setval('s1', 11);
> ALTER SEQUENCE s1 MAXVALUE 12;
> If the subscriber did not execute REFRESH PUBLICATION SEQUENCES, DDL
> replication will fail with error.
> ERROR:  RESTART value (14) cannot be greater than MAXVALUE (12)
>
> 2) Manual DDL on subscriber resulting in sequence synchronization failure.
> On the subscriber, the sequence maxvalue is changed:
> ALTER SEQUENCE s1 MAXVALUE 12;
>
> On the publisher, the sequence has advanced to 14:
> SELECT currval('s1');
> currval
> ---------
>      14
>
> REFRESH PUBLICATION SEQUENCES will fail because setting currvalue to
> 14 is greater than the changed maxvalue 12 in the subscriber.
>
> 3) Out of order DDL and REFRESH resulting in synchronization failure.
> Initially we have the same sequence on pub and sub. Then lets say pub
> has done parameter change:
> ALTER SEQUENCE s1 MAXVALUE 12;
> Now if this DDL is somehow not replicated on sub, REFRESH PUBLICATION
> SEQUENCES will fail initially and may work once DDL is replicated.
> ~~
> Problems 1 and 2 exist in both designs. While the WAL-based REFRESH
> may seem slightly better for Problem 3 since REFRESH on the subscriber
> will execute only after prior DDLs are replicated—even with the
> sequence-sync worker, this isn't a major issue. If a user triggers
> REFRESH before the DDL is replicated, the worker will refresh all
> sequences except the mismatched one, and keep restarting and retrying
> until the DDL is applied. Once that happens, the sequence sync
> completes automatically, without the user doing another REFRESH.
> Furthermore, the likelihood of a user executing REFRESH exactly during
> the window between the DDL execution on the publisher and its
> application on the subscriber seems relatively low.
>
> WAL-based approach OTOH introduces several additional challenges that
> may outweigh its potential benefits:
> 1)  Increases load on WAL sender to collect sequence values. We are
> talking about all the sequences here which could be huge in number.
> 2)  Table replication may stall until sequence conflicts are resolved.
> The  chances of hitting any conflict/error could be more here as
> compared to tables specially when sequence synchronization is not
> incremental and the number of sequences are huge. The continuous and
> more frequent errors if not handled by users may even end up
> invalidating the slot on primary.
>
> The worker approach neither blocks the apply worker in case of errors
> nor adds extra load on the WAL sender. On its own, Case 3 doesn’t seem
> significant enough to justify switching to a WAL-based design.
> Overall, the worker-based approach appears to be less complex and a
> better option.
>

Agree on this. Please find a few comments on the previous patch:


1)
+        Returns information about the sequence. <literal>last_value</literal>
+        last sequence value set in sequence by nextval or setval,

<literal>last_value</literal> indicates ....

2)
+ * If 'resync_all_sequences' is true:
+ *     Perform the above operation only for sequences.

Shall we update:
  Perform the above operation only for sequences and resync all the
  sequences including existing ones.

<old comment, I think somehow missed to be addressed.>

3)
+ check_and_launch_sync_worker(nsyncworkers, InvalidOid,
+ &MyLogicalRepWorker->last_seqsync_start_time);

Shall we simply name it as 'launch_sync_worker'.
'check' looks a little odd. All such functions (ex:
logicalrep_worker_launch) has internal checks but the name need not to
have 'check' keyword

4)
+ * Attempt to launch a sync worker (sequence or table) if there is a worker
+ * available and the retry interval has elapsed.

shall we say:
'if there is a sync worker slot available' instead of 'if there is a
worker available'

5)
copy_sequences:
+ if (!sequence_rel || !HeapTupleIsValid(tup))
+ {
+ elog(LOG, "skip synchronization of sequence \"%s.%s\" because it has
been dropped concurrently",
+ seqinfo->nspname, seqinfo->seqname);
+
+ batch_skipped_count++;
+ continue;
+ }

Is it possible that sequence_rel is valid while tuple is not? If
possible, then do we need table_close before continuing?

6)
In copy_sequences() wherever we are using seqinfo->nspname,
seqinfo->seqname; shall we directly use local vars nspname, seqname.

7)
LogicalRepSyncSequences:
+ /* Skip if sequence was dropped concurrently */
+ sequence_rel = try_table_open(subrel->srrelid, RowExclusiveLock);
+ if (!sequence_rel)
+ continue;

Here we are not checking tuple-validity like we did in copy_sequences
(comment 5 above). I think this alone should suffice even in
copy_sequences(). What do you think?

thanks
Shveta


Reply via email to