Re: Logical Replication of sequences

Michael Paquier Thu, 13 Jun 2024 16:46:16 -0700

On Thu, Jun 13, 2024 at 03:36:05PM +0530, Amit Kapila wrote:
> Fair enough. However, this raises the question Dilip and Vignesh are
> discussing whether we need a new relfilenode for sequence update even
> during initial sync? As per my understanding, the idea is that similar
> to tables, the CREATE SUBSCRIPTION command (with copy_data = true)
> will create the new sequence entries in pg_subscription_rel with the
> state as 'i'. Then the sequence-sync worker would start a transaction
> and one-by-one copy the latest sequence values for each sequence (that
> has state as 'i' in pg_subscription_rel) and mark its state as ready
> 'r' and commit the transaction. Now if there is an error during this
> operation it will restart the entire operation.


Hmm.  You mean to use only one transaction for all the sequences?
I've heard about deployments with a lot of them.  Could it be a
problem to process them in batches, as well?  If you maintain a state
for each one of them in pg_subscription_rel, it does not strike me as
an issue, while being more flexible than an all-or-nothing.

> The idea of creating a
> new relfilenode is to handle the error so that if there is a rollback,
> the sequence state will be rolled back to 'i' and the sequence value
> will also be rolled back. The other option could be that we update the
> sequence value without a new relfilenode and if the transaction rolled
> back then only the sequence's state will be rolled back to 'i'. This
> would work with a minor inconsistency that sequence values will be
> up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> I am not sure if that matters because anyway, they can quickly be
> out-of-sync with the publisher again.

Seeing a mention to relfilenodes specifically for sequences freaks me
out a bit, because there's some work I have been doing in this area
and sequences may not have a need for a physical relfilenode at all.
But I guess that you refer to the fact that like tables, relfilenodes
would only be created as required because anything you'd do in the
apply worker path would just call some of the routines of sequence.h,
right?

> Now, say we don't want to maintain the state of sequences for initial
> sync at all then after the error how will we detect if there are any
> pending sequences to be synced? One possibility is that we maintain a
> subscription level flag 'subsequencesync' in 'pg_subscription' to
> indicate whether sequences need sync. This flag would indicate whether
> to sync all the sequences in pg_susbcription_rel. This would mean that
> if there is an error while syncing the sequences we will resync all
> the sequences again. This could be acceptable considering the chances
> of error during sequence sync are low.

There could be multiple subscriptions to a single database that point
to the same set of sequences.  Is there any conflict issue to worry
about here?

> The benefit is that both the
> REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same
> idea and sync all sequences without needing a new relfilenode. Users
> can always refer 'subsequencesync' flag in 'pg_subscription' to see if
> all the sequences are synced after executing the command.

That would be cheaper, indeed.  Isn't a boolean too limiting?
Isn't that something you'd want to track with a LSN as "the point in
WAL where all the sequences have been synced"? 

The approach of doing all the sync work from the subscriber, while
having a command that can be kicked from the subscriber side is a good
user experience.
--
Michael

signature.asc
Description: PGP signature

Re: Logical Replication of sequences

Reply via email to