Re: Logical Replication of sequences

vignesh C Tue, 11 Jun 2024 03:36:42 -0700

On Tue, 11 Jun 2024 at 12:38, Masahiko Sawada <[email protected]> wrote:
>
> On Tue, Jun 11, 2024 at 12:25 PM vignesh C <[email protected]> wrote:
> >
> > On Mon, 10 Jun 2024 at 14:48, Amit Kapila <[email protected]> wrote:
> > >
> > > On Mon, Jun 10, 2024 at 12:43 PM Masahiko Sawada <[email protected]> 
> > > wrote:
> > > >
> > > > On Mon, Jun 10, 2024 at 3:14 PM Masahiko Sawada <[email protected]> 
> > > > wrote:
> > > > >
> > > > > On Fri, Jun 7, 2024 at 7:30 PM Amit Kapila <[email protected]> 
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > Are you imagining the behavior for sequences associated with tables
> > > > > > differently than the ones defined by the CREATE SEQUENCE .. 
> > > > > > command? I
> > > > > > was thinking that users would associate sequences with publications
> > > > > > similar to what we do for tables for both cases. For example, they
> > > > > > need to explicitly mention the sequences they want to replicate by
> > > > > > commands like CREATE PUBLICATION ... FOR SEQUENCE s1, s2, ...; 
> > > > > > CREATE
> > > > > > PUBLICATION ... FOR ALL SEQUENCES, or CREATE PUBLICATION ... FOR
> > > > > > SEQUENCES IN SCHEMA sch1;
> > > > > >
> > > > > > In this, variants FOR ALL SEQUENCES and SEQUENCES IN SCHEMA sch1
> > > > > > should copy both the explicitly defined sequences and sequences
> > > > > > defined with the tables. Do you think a different variant for just
> > > > > > copying sequences implicitly associated with tables (say for 
> > > > > > identity
> > > > > > columns)?
> > > > >
> > > > > Oh, I was thinking that your proposal was to copy literally all
> > > > > sequences by REPLICA/REFRESH SEQUENCE command.
> > > > >
> > >
> > > I am trying to keep the behavior as close to tables as possible.
> > >
> > > > > But it seems to make
> > > > > sense to explicitly specify the sequences they want to replicate. It
> > > > > also means that they can create a publication that has only sequences.
> > > > > In this case, even if they create a subscription for that publication,
> > > > > we don't launch any apply workers for that subscription. Right?
> > > > >
> > >
> > > Right, good point. I had not thought about this.
> > >
> > > > > Also, given that the main use case (at least as the first step) is
> > > > > version upgrade, do we really need to support SEQUENCES IN SCHEMA and
> > > > > even FOR SEQUENCE?
> > > >
> > >
> > > At the very least, we can split the patch to move these variants to a
> > > separate patch. Once the main patch is finalized, we can try to
> > > evaluate the remaining separately.
> >
> > I engaged in an offline discussion with Amit about strategizing the
> > division of patches to facilitate the review process. We agreed on the
> > following split: The first patch will encompass the setting and
> > getting of sequence values (core sequence changes). The second patch
> > will cover all changes on the publisher side related to "FOR ALL
> > SEQUENCES." The third patch will address subscriber side changes aimed
> > at synchronizing "FOR ALL SEQUENCES" publications. The fourth patch
> > will focus on supporting "FOR SEQUENCE" publication. Lastly, the fifth
> > patch will introduce support for  "FOR ALL SEQUENCES IN SCHEMA"
> > publication.
> >
> > I will work on this and share an updated patch for the same soon.
>
> +1. Sounds like a good plan.


Amit and I engaged in an offline discussion regarding the design and
contemplated that it could be like below:
1) CREATE PUBLICATION syntax enhancement:
CREATE PUBLICATION ... FOR ALL SEQUENCES;
The addition of a new column titled "all sequences" in the
pg_publication system table will signify whether the publication is
designated as all sequences publication or not.

2)  CREATE SUBSCRIPTION -- no syntax change.
Upon creation of a subscription, the following additional steps will
be managed by the subscriber:
i) The subscriber will retrieve the list of sequences associated with
the subscription's publications.
ii) For each sequence: a) Retrieve the sequence value from the
publisher by invoking the pg_sequence_state function. b) Set the
sequence with the value obtained from the publisher. iv) Once the
subscription creation is completed, all sequence values will become
visible at the subscriber's end.

An alternative design approach could involve retrieving the sequence
list from the publisher during subscription creation and inserting the
sequences with an "init" state into the pg_subscription_rel system
table. These tasks could be executed by a single sequence sync worker,
which would:
i) Retrieve the list of sequences in the "init" state from the
pg_subscription_rel system table.
ii) Initiate a transaction.
iii) For each sequence: a) Obtain the sequence value from the
publisher by utilizing the pg_sequence_state function. b) Update the
sequence with the value obtained from the publisher.
iv) Commit the transaction.

The benefit with the second approach is that if there are large number
of sequences, the sequence sync can be enhanced to happen in parallel
and also if there are any locks held on the sequences in the
publisher, the sequence worker can wait to acquire the lock instead of
blocking the whole create subscription command which will delay the
initial copy of the tables too.

3) Refreshing the sequence can be achieved through the existing
command: ALTER SUBSCRIPTION ... REFRESH PUBLICATION(no syntax change
here).
The subscriber identifies stale sequences, meaning sequences present
in pg_subscription_rel but absent from the publication, and removes
them from the pg_subscription_rel system table. The subscriber also
checks for newly added sequences in the publisher and synchronizes
their values from the publisher using the steps outlined in the
subscription creation process. It's worth noting that previously
synchronized sequences won't be synchronized again; the sequence sync
will occur solely for the newly added sequences.

4) Introducing a new command for refreshing all sequences: ALTER
SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES.
The subscriber will remove stale sequences and add newly added
sequences from the publisher. Following this, it will re-synchronize
the sequence values for all sequences in the updated list from the
publisher, following the steps outlined in the subscription creation
process.

5) Incorporate the pg_sequence_state function to fetch the sequence
value from the publisher, along with the page LSN. Incorporate
SetSequence function, which will procure a new relfilenode for the
sequence and set the new relfilenode with the specified value. This
will facilitate rollback in case of any failures.

Thoughts?

Regards,
Vignesh

Re: Logical Replication of sequences

Reply via email to