On Thu, Jun 13, 2024 at 03:36:05PM +0530, Amit Kapila wrote: > Fair enough. However, this raises the question Dilip and Vignesh are > discussing whether we need a new relfilenode for sequence update even > during initial sync? As per my understanding, the idea is that similar > to tables, the CREATE SUBSCRIPTION command (with copy_data = true) > will create the new sequence entries in pg_subscription_rel with the > state as 'i'. Then the sequence-sync worker would start a transaction > and one-by-one copy the latest sequence values for each sequence (that > has state as 'i' in pg_subscription_rel) and mark its state as ready > 'r' and commit the transaction. Now if there is an error during this > operation it will restart the entire operation.
Hmm. You mean to use only one transaction for all the sequences? I've heard about deployments with a lot of them. Could it be a problem to process them in batches, as well? If you maintain a state for each one of them in pg_subscription_rel, it does not strike me as an issue, while being more flexible than an all-or-nothing. > The idea of creating a > new relfilenode is to handle the error so that if there is a rollback, > the sequence state will be rolled back to 'i' and the sequence value > will also be rolled back. The other option could be that we update the > sequence value without a new relfilenode and if the transaction rolled > back then only the sequence's state will be rolled back to 'i'. This > would work with a minor inconsistency that sequence values will be > up-to-date even when the sequence state is 'i' in pg_subscription_rel. > I am not sure if that matters because anyway, they can quickly be > out-of-sync with the publisher again. Seeing a mention to relfilenodes specifically for sequences freaks me out a bit, because there's some work I have been doing in this area and sequences may not have a need for a physical relfilenode at all. But I guess that you refer to the fact that like tables, relfilenodes would only be created as required because anything you'd do in the apply worker path would just call some of the routines of sequence.h, right? > Now, say we don't want to maintain the state of sequences for initial > sync at all then after the error how will we detect if there are any > pending sequences to be synced? One possibility is that we maintain a > subscription level flag 'subsequencesync' in 'pg_subscription' to > indicate whether sequences need sync. This flag would indicate whether > to sync all the sequences in pg_susbcription_rel. This would mean that > if there is an error while syncing the sequences we will resync all > the sequences again. This could be acceptable considering the chances > of error during sequence sync are low. There could be multiple subscriptions to a single database that point to the same set of sequences. Is there any conflict issue to worry about here? > The benefit is that both the > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same > idea and sync all sequences without needing a new relfilenode. Users > can always refer 'subsequencesync' flag in 'pg_subscription' to see if > all the sequences are synced after executing the command. That would be cheaper, indeed. Isn't a boolean too limiting? Isn't that something you'd want to track with a LSN as "the point in WAL where all the sequences have been synced"? The approach of doing all the sync work from the subscriber, while having a command that can be kicked from the subscriber side is a good user experience. -- Michael
signature.asc
Description: PGP signature