On Tue, 11 Jun 2024 at 12:38, Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Tue, Jun 11, 2024 at 12:25 PM vignesh C <vignes...@gmail.com> wrote: > > > > On Mon, 10 Jun 2024 at 14:48, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Mon, Jun 10, 2024 at 12:43 PM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > > > > > > > On Mon, Jun 10, 2024 at 3:14 PM Masahiko Sawada <sawada.m...@gmail.com> > > > > wrote: > > > > > > > > > > On Fri, Jun 7, 2024 at 7:30 PM Amit Kapila <amit.kapil...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > Are you imagining the behavior for sequences associated with tables > > > > > > differently than the ones defined by the CREATE SEQUENCE .. > > > > > > command? I > > > > > > was thinking that users would associate sequences with publications > > > > > > similar to what we do for tables for both cases. For example, they > > > > > > need to explicitly mention the sequences they want to replicate by > > > > > > commands like CREATE PUBLICATION ... FOR SEQUENCE s1, s2, ...; > > > > > > CREATE > > > > > > PUBLICATION ... FOR ALL SEQUENCES, or CREATE PUBLICATION ... FOR > > > > > > SEQUENCES IN SCHEMA sch1; > > > > > > > > > > > > In this, variants FOR ALL SEQUENCES and SEQUENCES IN SCHEMA sch1 > > > > > > should copy both the explicitly defined sequences and sequences > > > > > > defined with the tables. Do you think a different variant for just > > > > > > copying sequences implicitly associated with tables (say for > > > > > > identity > > > > > > columns)? > > > > > > > > > > Oh, I was thinking that your proposal was to copy literally all > > > > > sequences by REPLICA/REFRESH SEQUENCE command. > > > > > > > > > > > I am trying to keep the behavior as close to tables as possible. > > > > > > > > But it seems to make > > > > > sense to explicitly specify the sequences they want to replicate. It > > > > > also means that they can create a publication that has only sequences. > > > > > In this case, even if they create a subscription for that publication, > > > > > we don't launch any apply workers for that subscription. Right? > > > > > > > > > > > Right, good point. I had not thought about this. > > > > > > > > Also, given that the main use case (at least as the first step) is > > > > > version upgrade, do we really need to support SEQUENCES IN SCHEMA and > > > > > even FOR SEQUENCE? > > > > > > > > > > At the very least, we can split the patch to move these variants to a > > > separate patch. Once the main patch is finalized, we can try to > > > evaluate the remaining separately. > > > > I engaged in an offline discussion with Amit about strategizing the > > division of patches to facilitate the review process. We agreed on the > > following split: The first patch will encompass the setting and > > getting of sequence values (core sequence changes). The second patch > > will cover all changes on the publisher side related to "FOR ALL > > SEQUENCES." The third patch will address subscriber side changes aimed > > at synchronizing "FOR ALL SEQUENCES" publications. The fourth patch > > will focus on supporting "FOR SEQUENCE" publication. Lastly, the fifth > > patch will introduce support for "FOR ALL SEQUENCES IN SCHEMA" > > publication. > > > > I will work on this and share an updated patch for the same soon. > > +1. Sounds like a good plan.
Amit and I engaged in an offline discussion regarding the design and contemplated that it could be like below: 1) CREATE PUBLICATION syntax enhancement: CREATE PUBLICATION ... FOR ALL SEQUENCES; The addition of a new column titled "all sequences" in the pg_publication system table will signify whether the publication is designated as all sequences publication or not. 2) CREATE SUBSCRIPTION -- no syntax change. Upon creation of a subscription, the following additional steps will be managed by the subscriber: i) The subscriber will retrieve the list of sequences associated with the subscription's publications. ii) For each sequence: a) Retrieve the sequence value from the publisher by invoking the pg_sequence_state function. b) Set the sequence with the value obtained from the publisher. iv) Once the subscription creation is completed, all sequence values will become visible at the subscriber's end. An alternative design approach could involve retrieving the sequence list from the publisher during subscription creation and inserting the sequences with an "init" state into the pg_subscription_rel system table. These tasks could be executed by a single sequence sync worker, which would: i) Retrieve the list of sequences in the "init" state from the pg_subscription_rel system table. ii) Initiate a transaction. iii) For each sequence: a) Obtain the sequence value from the publisher by utilizing the pg_sequence_state function. b) Update the sequence with the value obtained from the publisher. iv) Commit the transaction. The benefit with the second approach is that if there are large number of sequences, the sequence sync can be enhanced to happen in parallel and also if there are any locks held on the sequences in the publisher, the sequence worker can wait to acquire the lock instead of blocking the whole create subscription command which will delay the initial copy of the tables too. 3) Refreshing the sequence can be achieved through the existing command: ALTER SUBSCRIPTION ... REFRESH PUBLICATION(no syntax change here). The subscriber identifies stale sequences, meaning sequences present in pg_subscription_rel but absent from the publication, and removes them from the pg_subscription_rel system table. The subscriber also checks for newly added sequences in the publisher and synchronizes their values from the publisher using the steps outlined in the subscription creation process. It's worth noting that previously synchronized sequences won't be synchronized again; the sequence sync will occur solely for the newly added sequences. 4) Introducing a new command for refreshing all sequences: ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES. The subscriber will remove stale sequences and add newly added sequences from the publisher. Following this, it will re-synchronize the sequence values for all sequences in the updated list from the publisher, following the steps outlined in the subscription creation process. 5) Incorporate the pg_sequence_state function to fetch the sequence value from the publisher, along with the page LSN. Incorporate SetSequence function, which will procure a new relfilenode for the sequence and set the new relfilenode with the specified value. This will facilitate rollback in case of any failures. Thoughts? Regards, Vignesh