Re: Synchronizing slots from primary to standby
On Thu, Jan 11, 2024 at 7:28 AM Peter Smith wrote: > > Here are some review comments for patch v58-0002 Thank You for the feedback. These are addressed in v60. Please find my response inline for a few. > (FYI - I quickly checked with the latest v59-0002 and AFAIK all these > review comments below are still relevant) > > == > Commit message > > 1. > If a logical slot is invalidated on the primary, slot on the standby is also > invalidated. > > ~ > > /slot on the standby/then that slot on the standby/ > > == > doc/src/sgml/logicaldecoding.sgml > > 2. > In order to resume logical replication after failover from the synced > logical slots, it is required that 'conninfo' in subscriptions are > altered to point to the new primary server using ALTER SUBSCRIPTION > ... CONNECTION. It is recommended that subscriptions are first > disabled before promoting the standby and are enabled back once these > are altered as above after failover. > > ~ > > Minor rewording mainly to reduce a long sentence. > > SUGGESTION > To resume logical replication after failover from the synced logical > slots, the subscription's 'conninfo' must be altered to point to the > new primary server. This is done using ALTER SUBSCRIPTION ... > CONNECTION. It is recommended that subscriptions are first disabled > before promoting the standby and are enabled back after altering the > connection string. > > == > doc/src/sgml/system-views.sgml > > 3. > + > + synced bool > + > + > + True if this logical slot was synced from a primary server. > + > + > > SUGGESTION > True if this is a logical slot that was synced from a primary server. > > == > src/backend/access/transam/xlogrecovery.c > > 4. > + /* > + * Shutdown the slot sync workers to prevent potential conflicts between > + * user processes and slotsync workers after a promotion. > + * > + * We do not update the 'synced' column from true to false here, as any > + * failed update could leave some slot's 'synced' column as false. This > + * could cause issues during slot sync after restarting the server as a > + * standby. While updating after switching to the new timeline is an > + * option, it does not simplify the handling for 'synced' column. > + * Therefore, we retain the 'synced' column as true after promotion as they > + * can provide useful information about their origin. > + */ > > Minor comment wording changes. > > BEFORE > ...any failed update could leave some slot's 'synced' column as false. > SUGGESTION > ...any failed update could leave 'synced' column false for some slots. > > ~ > > BEFORE > Therefore, we retain the 'synced' column as true after promotion as > they can provide useful information about their origin. > SUGGESTION > Therefore, we retain the 'synced' column as true after promotion as it > may provide useful information about the slot origin. > > == > src/backend/replication/logical/slotsync.c > > 5. > + * While creating the slot on physical standby, if the local restart_lsn > and/or > + * local catalog_xmin is ahead of those on the remote then the worker cannot > + * create the local slot in sync with the primary server because that would > + * mean moving the local slot backwards and the standby might not have WALs > + * retained for old LSN. In this case, the worker will mark the slot as > + * RS_TEMPORARY. Once the primary server catches up, it will move the slot to > + * RS_PERSISTENT and will perform the sync periodically. > > /will move the slot to RS_PERSISTENT/will mark the slot as RS_PERSISTENT/ > > ~~~ > > 6. drop_synced_slots_internal > +/* > + * Helper function for drop_obsolete_slots() > + * > + * Drops synced slot identified by the passed in name. > + */ > +static void > +drop_synced_slots_internal(const char *name, bool nowait) > +{ > + Assert(MyReplicationSlot == NULL); > + > + ReplicationSlotAcquire(name, nowait); > + > + Assert(MyReplicationSlot->data.synced); > + > + ReplicationSlotDropAcquired(); > +} > > IMO you don't need this function. AFAICT it is only called from one > place and does not result in fewer lines of code. > > ~~~ > > 7. get_local_synced_slots > > + /* Check if it is logical synchronized slot */ > + if (s->in_use && SlotIsLogical(s) && s->data.synced) > + { > + local_slots = lappend(local_slots, s); > + } > > Do you need to check SlotIsLogical(s) here? I thought s->data.synced > can never be true for physical slots. I felt you could write this like > blelow: > > if (s->in_use s->data.synced) > { > Assert(SlotIsLogical(s)); > local_slots = lappend(local_slots, s); > } > > ~~~ > > 8. check_sync_slot_on_remote > > +static bool > +check_sync_slot_on_remote(ReplicationSlot *local_slot, List *remote_slots, > + bool *locally_invalidated) > +{ > + ListCell *lc; > + > + foreach(lc, remote_slots) > + { > + RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc); > > I think you can use the new style foreach_ptr list macros here. > > ~~~ > > 9. drop_obsolete_slots > > +drop_obs
Re: Synchronizing slots from primary to standby
On Fri, Jan 12, 2024 at 5:30 PM Masahiko Sawada wrote: > > On Thu, Jan 11, 2024 at 7:53 PM Amit Kapila wrote: > > > > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila wrote: > > > > > > +static bool > > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot) > > > { > > > ... > > > + /* Slot ready for sync, so sync it. */ > > > + else > > > + { > > > + /* > > > + * Sanity check: With hot_standby_feedback enabled and > > > + * invalidations handled appropriately as above, this should never > > > + * happen. > > > + */ > > > + if (remote_slot->restart_lsn < slot->data.restart_lsn) > > > + elog(ERROR, > > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > > + " to remote slot's LSN(%X/%X) as synchronization" > > > + " would move it backwards", remote_slot->name, > > > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn)); > > > ... > > > } > > > > > > I was thinking about the above code in the patch and as far as I can > > > think this can only occur if the same name slot is re-created with > > > prior restart_lsn after the existing slot is dropped. Normally, the > > > newly created slot (with the same name) will have higher restart_lsn > > > but one can mimic it by copying some older slot by using > > > pg_copy_logical_replication_slot(). > > > > > > I don't think as mentioned in comments even if hot_standby_feedback is > > > temporarily set to off, the above shouldn't happen. It can only lead > > > to invalidated slots on standby. > > > > > > To close the above race, I could think of the following ways: > > > 1. Drop and re-create the slot. > > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves > > > ahead of local_slot's LSN then we can update it; but as mentioned in > > > your previous comment, we need to update all other fields as well. If > > > we follow this then we probably need to have a check for catalog_xmin > > > as well. > > > > > > > The second point as mentioned is slightly misleading, so let me try to > > rephrase it once again: Emit LOG/WARNING in this case and once > > remote_slot's LSN moves ahead of local_slot's LSN then we can update > > it; additionally, we need to update all other fields like two_phase as > > well. If we follow this then we probably need to have a check for > > catalog_xmin as well along remote_slot's restart_lsn. > > > > > Now, related to this the other case which needs some handling is what > > > if the remote_slot's restart_lsn is greater than local_slot's > > > restart_lsn but it is a re-created slot with the same name. In that > > > case, I think the other properties like 'two_phase', 'plugin' could be > > > different. So, is simply copying those sufficient or do we need to do > > > something else as well? > > > > > > > Bertrand, Dilip, Sawada-San, and others, please share your opinion on > > this problem as I think it is important to handle this race condition. > > Is there any good use case of copying a failover slot in the first > place? If it's not a normal use case and we can probably live without > it, why not always disable failover during the copy? FYI we always > disable two_phase on copied slots. It seems to me that copying a > failover slot could lead to problems, as long as we synchronize slots > based on their names. IIUC without the copy, this pass should never > happen. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com There are multiple approaches discussed and tried when it comes to starting a slot-sync worker. I am summarizing all here: 1) Make slotsync worker as an Auxiliary Process (like checkpointer, walwriter, walreceiver etc). The benefit this approach provides is, it can control begin and stop in a more flexible way as each auxiliary process could have different checks before starting and can have different stop conditions. But it needs code duplication for process management(start, stop, crash handling, signals etc) and currently it does not support db-connection smoothly (none of the auxiliary process has one so far) We attempted to make slot-sync worker as an auxiliary process and faced some challenges. The slot sync worker needs db-connection and thus needs InitPostgres(). But AuxiliaryProcessMain() and InitPostgres() are not compatible as both invoke common functions and end up setting many callbacks functions twice (with different args). Also InitPostgres() does 'MyBackendId' initialization (which further triggers some stuff) which is not needed for AuxiliaryProcess and so on. And thus in order to make slot-sync worker as an auxiliary process, we need something similar to InitPostgres (trimmed down working version) which needs further detailed analysis. 2) Make slotsync worker as a 'special' process like AutoVacLauncher which is neither an Auxiliary process nor a bgworker one. It allows db-connection and also provides flexibility to have start and stop conditions for a process. But it needs a lot of code-duplication ar
Re: Synchronizing slots from primary to standby
On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila wrote: > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik wrote: > > > > There are multiple approaches discussed and tried when it comes to > > starting a slot-sync worker. I am summarizing all here: > > > > 1) Make slotsync worker as an Auxiliary Process (like checkpointer, > > walwriter, walreceiver etc). The benefit this approach provides is, it > > can control begin and stop in a more flexible way as each auxiliary > > process could have different checks before starting and can have > > different stop conditions. But it needs code duplication for process > > management(start, stop, crash handling, signals etc) and currently it > > does not support db-connection smoothly (none of the auxiliary process > > has one so far) > > > > As slotsync worker needs to perform transactions and access syscache, > we can't make it an auxiliary process as that doesn't initialize the > required stuff like syscache. Also, see the comment "Auxiliary > processes don't run transactions ..." in AuxiliaryProcessMain() which > means this is not an option. > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher > > which is neither an Auxiliary process nor a bgworker one. It allows > > db-connection and also provides flexibility to have start and stop > > conditions for a process. > > > > Yeah, due to these reasons, I think this option is worth considering > and another plus point is that this allows us to make enable_syncslot > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER. > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our > > process as a bgworker (RegisterBackgroundWorker()) by providing a > > relevant start_time and restart_time and then the process management > > is well taken care of. It does not need any code-duplication and > > allows db-connection smoothly in registered process. The only thing it > > lacks is that it does not provide flexibility of having > > start-condition which then makes us to have 'enable_syncslot' as > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I > > feel enable_syncslot is something which will not be changed frequently > > and with the benefits provided by bgworker infra, it seems a > > reasonably good option to choose this approach. > > > > I agree but it may be better to make it a PGC_SIGHUP parameter. > > > 4) Another option is to have Logical Replication Launcher(or a new > > process) to launch slot-sync worker. But going by the current design > > where we have only 1 slotsync worker, it may be an overhead to have an > > additional manager process maintained. > > > > I don't see any good reason to have an additional launcher process here. > > > > > Thus weighing pros and cons of all these options, we have currently > > implemented the bgworker approach (approach 3). Any feedback is > > welcome. > > > > I vote to go for (2) unless we face difficulties in doing so but (3) > is also okay especially if others also think so. I am not against any of the approaches but I still feel that when we have a standard way of doing things (bgworker) we should not keep adding code to do things in a special way unless there is a strong reason to do so. Now we need to decide if 'enable_syncslot' being PGC_POSTMASTER is a strong reason to go the non-standard way? If yes, then we should think of option 2 else option 3 seems better in my understanding (which may be limited due to my short experience here), so I am all ears to what others think on this. thanks Shveta
Re: Synchronizing slots from primary to standby
On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada wrote: > > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila wrote: > > > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik wrote: > > > > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila > > > wrote: > > > > > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik > > > > wrote: > > > > > > > > > > There are multiple approaches discussed and tried when it comes to > > > > > starting a slot-sync worker. I am summarizing all here: > > > > > > > > > > 1) Make slotsync worker as an Auxiliary Process (like checkpointer, > > > > > walwriter, walreceiver etc). The benefit this approach provides is, it > > > > > can control begin and stop in a more flexible way as each auxiliary > > > > > process could have different checks before starting and can have > > > > > different stop conditions. But it needs code duplication for process > > > > > management(start, stop, crash handling, signals etc) and currently it > > > > > does not support db-connection smoothly (none of the auxiliary process > > > > > has one so far) > > > > > > > > > > > > > As slotsync worker needs to perform transactions and access syscache, > > > > we can't make it an auxiliary process as that doesn't initialize the > > > > required stuff like syscache. Also, see the comment "Auxiliary > > > > processes don't run transactions ..." in AuxiliaryProcessMain() which > > > > means this is not an option. > > > > > > > > > > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher > > > > > which is neither an Auxiliary process nor a bgworker one. It allows > > > > > db-connection and also provides flexibility to have start and stop > > > > > conditions for a process. > > > > > > > > > > > > > Yeah, due to these reasons, I think this option is worth considering > > > > and another plus point is that this allows us to make enable_syncslot > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER. > > > > > > > > > > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a > > > > > relevant start_time and restart_time and then the process management > > > > > is well taken care of. It does not need any code-duplication and > > > > > allows db-connection smoothly in registered process. The only thing it > > > > > lacks is that it does not provide flexibility of having > > > > > start-condition which then makes us to have 'enable_syncslot' as > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I > > > > > feel enable_syncslot is something which will not be changed frequently > > > > > and with the benefits provided by bgworker infra, it seems a > > > > > reasonably good option to choose this approach. > > > > > > > > > > > > > I agree but it may be better to make it a PGC_SIGHUP parameter. > > > > > > > > > 4) Another option is to have Logical Replication Launcher(or a new > > > > > process) to launch slot-sync worker. But going by the current design > > > > > where we have only 1 slotsync worker, it may be an overhead to have an > > > > > additional manager process maintained. > > > > > > > > > > > > > I don't see any good reason to have an additional launcher process here. > > > > > > > > > > > > > > Thus weighing pros and cons of all these options, we have currently > > > > > implemented the bgworker approach (approach 3). Any feedback is > > > > > welcome. > > > > > > > > > > > > > I vote to go for (2) unless we face difficulties in doing so but (3) > > > > is also okay especially if others also think so. > > > > > > I am not against any of the approaches but I still feel that when we > > > have a standard way of doing things (bgworker) we should not keep > > > adding code to do things in a special way unless there is a strong > > > reason to do so. Now we need to decide if 'enable_syncslot' being > > > PGC_P
Re: Synchronizing slots from primary to standby
On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila wrote: > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik wrote: > > > > There are multiple approaches discussed and tried when it comes to > > starting a slot-sync worker. I am summarizing all here: > > > > 1) Make slotsync worker as an Auxiliary Process (like checkpointer, > > walwriter, walreceiver etc). The benefit this approach provides is, it > > can control begin and stop in a more flexible way as each auxiliary > > process could have different checks before starting and can have > > different stop conditions. But it needs code duplication for process > > management(start, stop, crash handling, signals etc) and currently it > > does not support db-connection smoothly (none of the auxiliary process > > has one so far) > > > > As slotsync worker needs to perform transactions and access syscache, > we can't make it an auxiliary process as that doesn't initialize the > required stuff like syscache. Also, see the comment "Auxiliary > processes don't run transactions ..." in AuxiliaryProcessMain() which > means this is not an option. > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher > > which is neither an Auxiliary process nor a bgworker one. It allows > > db-connection and also provides flexibility to have start and stop > > conditions for a process. > > > > Yeah, due to these reasons, I think this option is worth considering > and another plus point is that this allows us to make enable_syncslot > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER. > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our > > process as a bgworker (RegisterBackgroundWorker()) by providing a > > relevant start_time and restart_time and then the process management > > is well taken care of. It does not need any code-duplication and > > allows db-connection smoothly in registered process. The only thing it > > lacks is that it does not provide flexibility of having > > start-condition which then makes us to have 'enable_syncslot' as > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I > > feel enable_syncslot is something which will not be changed frequently > > and with the benefits provided by bgworker infra, it seems a > > reasonably good option to choose this approach. > > > > I agree but it may be better to make it a PGC_SIGHUP parameter. > > > 4) Another option is to have Logical Replication Launcher(or a new > > process) to launch slot-sync worker. But going by the current design > > where we have only 1 slotsync worker, it may be an overhead to have an > > additional manager process maintained. > > > > I don't see any good reason to have an additional launcher process here. > > > > > Thus weighing pros and cons of all these options, we have currently > > implemented the bgworker approach (approach 3). Any feedback is > > welcome. > > > > I vote to go for (2) unless we face difficulties in doing so but (3) > is also okay especially if others also think so. Okay. Attempted approach 2 as a separate patch in v62-0003. Approach 3 (bgworker) is still maintained in v62-002. thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Jan 17, 2024 at 6:43 AM Masahiko Sawada wrote: > > On Tue, Jan 16, 2024 at 6:40 PM shveta malik wrote: > > > > On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada > > wrote: > > > > > > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila > > > wrote: > > > > > > > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik > > > > wrote: > > > > > > > > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik > > > > > > wrote: > > > > > > > > > > > > > > There are multiple approaches discussed and tried when it comes to > > > > > > > starting a slot-sync worker. I am summarizing all here: > > > > > > > > > > > > > > 1) Make slotsync worker as an Auxiliary Process (like > > > > > > > checkpointer, > > > > > > > walwriter, walreceiver etc). The benefit this approach provides > > > > > > > is, it > > > > > > > can control begin and stop in a more flexible way as each > > > > > > > auxiliary > > > > > > > process could have different checks before starting and can have > > > > > > > different stop conditions. But it needs code duplication for > > > > > > > process > > > > > > > management(start, stop, crash handling, signals etc) and > > > > > > > currently it > > > > > > > does not support db-connection smoothly (none of the auxiliary > > > > > > > process > > > > > > > has one so far) > > > > > > > > > > > > > > > > > > > As slotsync worker needs to perform transactions and access > > > > > > syscache, > > > > > > we can't make it an auxiliary process as that doesn't initialize the > > > > > > required stuff like syscache. Also, see the comment "Auxiliary > > > > > > processes don't run transactions ..." in AuxiliaryProcessMain() > > > > > > which > > > > > > means this is not an option. > > > > > > > > > > > > > > > > > > > > 2) Make slotsync worker as a 'special' process like > > > > > > > AutoVacLauncher > > > > > > > which is neither an Auxiliary process nor a bgworker one. It > > > > > > > allows > > > > > > > db-connection and also provides flexibility to have start and stop > > > > > > > conditions for a process. > > > > > > > > > > > > > > > > > > > Yeah, due to these reasons, I think this option is worth considering > > > > > > and another plus point is that this allows us to make > > > > > > enable_syncslot > > > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER. > > > > > > > > > > > > > > > > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register > > > > > > > our > > > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a > > > > > > > relevant start_time and restart_time and then the process > > > > > > > management > > > > > > > is well taken care of. It does not need any code-duplication and > > > > > > > allows db-connection smoothly in registered process. The only > > > > > > > thing it > > > > > > > lacks is that it does not provide flexibility of having > > > > > > > start-condition which then makes us to have 'enable_syncslot' as > > > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said > > > > > > > this, I > > > > > > > feel enable_syncslot is something which will not be changed > > > > > > > frequently > > > > > > > and with the benefits provided by bgworker infra, it seems a > > > > > > > reasonably good option to choose this approach. > > > > > > > > > > > > > > > > > > > I agree but it may be better to make it a PGC_SIGHUP parameter. > > > > > > > > >
Re: Synchronizing slots from primary to standby
On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot wrote: > > Hi, > > On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote: > > PFA v62. Details: > > Thanks! > > > v62-003: > > It is a new patch which attempts to implement slot-sync worker as a > > special process which is neither a bgworker nor an Auxiliary process. > > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP > > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if > > it is hot-standby and 'enable_syncslot' is ON. > > The implementation looks reasonable to me (from what I can see some parts is > copy/paste from an already existing "special" process and some parts are > "sync slot" specific) which makes fully sense. Thanks for the feedback. I have addressed the comments in v63 except 5th one. > A few remarks: > > 1 === > +* Was it the slot sycn worker? > > Typo: sycn > > 2 === > +* ones), and no walwriter, autovac launcher or bgwriter or > slot sync > > Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync" > > 3 === > + * restarting slot slyc worker. If stopSignaled is set, the worker will > > Typo: slyc > > 4 === > +/* Flag to tell if we are in an slot sync worker process */ > > s/an/a/ ? > > 5 === (coming from v62-0002) > + Assert(tuplestore_tuple_count(res->tuplestore) == 1); > > Is it even possible for the related query to not return only one row? (I > think the > "count" ensures it). I think you are right. This assertion was added sometime back on the basis of feedback on hackers. Let me review that again. I can consider this comment in the next version. > 6 === > if (conninfo_changed || > primary_slotname_changed || > + old_enable_syncslot != enable_syncslot || > (old_hot_standby_feedback != hot_standby_feedback)) > { > ereport(LOG, > errmsg("slot sync worker will restart because > of" >" a parameter change")); > > I don't think "slot sync worker will restart" is true if one change > enable_syncslot > from on to off. Yes, right. I have changed the log-msg in this specific case. > > IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease > the review). But let's wait to see if others think differently. > > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Thu, Jan 18, 2024 at 10:31 AM Peter Smith wrote: > > I have one question about the new code in v63-0002. > > == > src/backend/replication/logical/slotsync.c > > 1. ReplSlotSyncWorkerMain > > + Assert(SlotSyncWorker->pid == InvalidPid); > + > + /* > + * Startup process signaled the slot sync worker to stop, so if meanwhile > + * postmaster ended up starting the worker again, exit. > + */ > + if (SlotSyncWorker->stopSignaled) > + { > + SpinLockRelease(&SlotSyncWorker->mutex); > + proc_exit(0); > + } > > Can we be sure a worker crash can't occur (in ShutDownSlotSync?) in > such a way that SlotSyncWorker->stopSignaled was already assigned > true, but SlotSyncWorker->pid was not yet reset to InvalidPid; > > e.g. Is the Assert above still OK? We are good with the Assert here. I tried below cases: 1) When slotsync worker is say killed using 'kill', it is considered as SIGTERM; slot sync worker invokes 'slotsync_worker_onexit()' before going down and thus sets SlotSyncWorker->pid = InvalidPid. This means when it is restarted (considering we have put the breakpoints in such a way that postmaster had already reached do_start_bgworker() before promotion finished), it is able to see stopSignaled set but pid is InvalidPid and thus we are good. 2) Another case is when we kill slot sync worker using 'kill -9' (or say we make it crash), in such a case, postmaster signals each sibling process to quit (including startup process) and cleans up the shared memory used by each (including SlotSyncWorker). In such a case promotion fails. And if slot sync worker is started again, it will find pid as InvalidPid. So we are good. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Jan 19, 2024 at 11:23 AM Amit Kapila wrote: > > > > 5 === (coming from v62-0002) > > > + Assert(tuplestore_tuple_count(res->tuplestore) == 1); > > > > > > Is it even possible for the related query to not return only one row? (I > > > think the > > > "count" ensures it). > > > > I think you are right. This assertion was added sometime back on the > > basis of feedback on hackers. Let me review that again. I can consider > > this comment in the next version. > > > > OTOH, can't we keep the assert as it is but remove "= 1" from > "count(*) = 1" in the query. There shouldn't be more than one slot > with same name on the primary. Or, am I missing something? There will be 1 record max and 0 record if the primary_slot_name is invalid. Keeping 'count(*)=1' gives the benefit that it will straight away give us true/false indicating if we are good or not wrt primary_slot_name. I feel Assert can be removed and we can simply have: if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot)) elog(ERROR, "failed to fetch primary_slot_name tuple"); thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada wrote: > > > Thank you for updating the patch. I have some comments: > > --- > +latestWalEnd = GetWalRcvLatestWalEnd(); > +if (remote_slot->confirmed_lsn > latestWalEnd) > +{ > +elog(ERROR, "exiting from slot synchronization as the > received slot sync" > + " LSN %X/%X for slot \"%s\" is ahead of the > standby position %X/%X", > + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), > + remote_slot->name, > + LSN_FORMAT_ARGS(latestWalEnd)); > +} > > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is > typically the primary server's flush position and doesn't mean the LSN > where the walreceiver received/flushed up to. yes. I think it makes more sense to use something which actually tells flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd() with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have enabled the slot-sync feature in a running standby, in that case we are all good (flushedUpto is the same as actual flush-position indicated by LogstreamResult.Flush). But if I restart standby, then I observed that the startup process sets flushedUpto to some value 'x' (see [1]) while when the wal-receiver starts, it sets 'LogstreamResult.Flush' to another value (see [2]) which is always greater than 'x'. And we do not update flushedUpto with the 'LogstreamResult.Flush' value in walreceiver until we actually do an operation on primary. Performing a data change on primary sends WALs to standby which then hits XLogWalRcvFlush() and updates flushedUpto same as LogstreamResult.Flush. Until then we have a situation where slots received on standby are ahead of flushedUpto and thus slotsync worker keeps one erroring out. I am yet to find out why flushedUpto is set to a lower value than 'LogstreamResult.Flush' at the start of standby. Or maybe am I using the wrong function GetWalRcvFlushRecPtr() and should be using something else instead? [1]: Startup process sets 'flushedUpto' here: ReadPageInternal-->XLogPageRead-->WaitForWALToBecomeAvailable-->RequestXLogStreaming [2]: Walreceiver sets 'LogstreamResult.Flush' here but do not update 'flushedUpto' here: WalReceiverMain(): LogstreamResult.Write = LogstreamResult.Flush = GetXLogReplayRecPtr(NULL) > Does it really happen > that the slot's confirmed_flush_lsn is higher than the primary's flush > lsn? It may happen if we have not configured standby_slot_names on primary. In such a case, slots may get updated w/o confirming that standby has taken the change and thus slot-sync worker may fetch the slots which have lsns ahead of the latest WAL position on standby. thanks Shveta
Re: Synchronizing slots from primary to standby
On Thu, Jan 18, 2024 at 4:49 PM Amit Kapila wrote: > 2. > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot) > { > ... > + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > + xmin_horizon = GetOldestSafeDecodingTransactionId(true); > + SpinLockAcquire(&slot->mutex); > + slot->data.catalog_xmin = xmin_horizon; > + SpinLockRelease(&slot->mutex); > ... > } > > Here, why slot->effective_catalog_xmin is not updated? The same is > required by a later call to ReplicationSlotsComputeRequiredXmin(). I > see that the prior version v60-0002 has the corresponding change but > it is missing in the latest version. Any reason? I think it was a mistake in v61. Added it back in v64.. > > 3. > + * Return true either if the slot is marked as RS_PERSISTENT (sync-ready) or > + * is synced periodically (if it was already sync-ready). Return false > + * otherwise. > + */ > +static bool > +update_and_persist_slot(RemoteSlot *remote_slot) > > The second part of the above comment (or is synced periodically (if it > was already sync-ready)) is not clear to me. Does it intend to > describe the case when we try to update the already created temp slot > in the last call. If so, that is not very clear because periodically > sounds like it can be due to repeated sync for sync-ready slot. The comment was as per old functionality where this function was doing persist and save both. In v61 code changed, but comment was not updated. I have changed it now in v64. > 4. > +update_and_persist_slot(RemoteSlot *remote_slot) > { > ... > + (void) local_slot_update(remote_slot); > ... > } > > Can we write a comment to state the reason why we don't care about the > return value here? Since it is the first time 'local_slot_update' is happening on any slot, the return value must be true i.e. local_slot_update() should not skip the update. I have thus added an Assert on return value now (in v64). thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Jan 22, 2024 at 12:28 PM Amit Kapila wrote: > > On Fri, Jan 19, 2024 at 3:55 PM shveta malik wrote: > > > > On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada > > wrote: > > > > > > > > > Thank you for updating the patch. I have some comments: > > > > > > --- > > > +latestWalEnd = GetWalRcvLatestWalEnd(); > > > +if (remote_slot->confirmed_lsn > latestWalEnd) > > > +{ > > > +elog(ERROR, "exiting from slot synchronization as the > > > received slot sync" > > > + " LSN %X/%X for slot \"%s\" is ahead of the > > > standby position %X/%X", > > > + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), > > > + remote_slot->name, > > > + LSN_FORMAT_ARGS(latestWalEnd)); > > > +} > > > > > > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is > > > typically the primary server's flush position and doesn't mean the LSN > > > where the walreceiver received/flushed up to. > > > > yes. I think it makes more sense to use something which actually tells > > flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd() > > with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have > > enabled the slot-sync feature in a running standby, in that case we > > are all good (flushedUpto is the same as actual flush-position > > indicated by LogstreamResult.Flush). But if I restart standby, then I > > observed that the startup process sets flushedUpto to some value 'x' > > (see [1]) while when the wal-receiver starts, it sets > > 'LogstreamResult.Flush' to another value (see [2]) which is always > > greater than 'x'. And we do not update flushedUpto with the > > 'LogstreamResult.Flush' value in walreceiver until we actually do an > > operation on primary. Performing a data change on primary sends WALs > > to standby which then hits XLogWalRcvFlush() and updates flushedUpto > > same as LogstreamResult.Flush. Until then we have a situation where > > slots received on standby are ahead of flushedUpto and thus slotsync > > worker keeps one erroring out. I am yet to find out why flushedUpto is > > set to a lower value than 'LogstreamResult.Flush' at the start of > > standby. Or maybe am I using the wrong function > > GetWalRcvFlushRecPtr() and should be using something else instead? > > > > Can we think of using GetStandbyFlushRecPtr()? We probably need to > expose this function, if this works for the required purpose. I think we can. For the records, the problem while using flushedUpto (or GetWalRcvFlushRecPtr()) directly is that it is not set to the latest flushed position immediately after startup. It points to some prior location (perhaps segment or page start) after startup until some data is flushed next which then updates it to the latest flushed position, thus we can not use it directly. GetStandbyFlushRecPtr() OTOH takes care of it i.e. it returns correct flushed-location at any point of time. I have changed v65 to use this one. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Jan 19, 2024 at 11:48 AM Peter Smith wrote: > > Here are some review comments for patch v63-0003. Thanks Peter. I have addressed all in v65. > > 4b. > It was a bit different when there were ERRORs but now they are LOGs > somehow it seems wrong for this function to say what the *caller* will > do. Maybe you can rewrite all the errmsg so the don't say "skipping" > but they just say "bad configuration for slot synchronization" > > If valid is false then you can LOG "skipping" at the caller... I have made this change but now in the log file we see 3 logs like below, does it seem apt? Was the earlier one better where we get the info in 2 lines? [34416] LOG: bad configuration for slot synchronization [34416] HINT: hot_standby_feedback must be enabled. [34416] LOG: skipping slot synchronization thanks Shveta
Re: Synchronizing slots from primary to standby
On Tue, Jan 23, 2024 at 9:45 AM Peter Smith wrote: > > Here are some review comments for v65-0002 Thanks Peter for the feedback. I have addressed these in v66. > > 4. GetStandbyFlushRecPtr > > /* > - * Returns the latest point in WAL that has been safely flushed to disk, and > - * can be sent to the standby. This should only be called when in recovery, > - * ie. we're streaming to a cascaded standby. > + * Returns the latest point in WAL that has been safely flushed to disk. > + * This should only be called when in recovery. > + * > > Since it says "This should only be called when in recovery", should > there also be a check for that (e.g. RecoveryInProgress) in the added > Assert? Since 'am_cascading_walsender' and 'IsLogicalSlotSyncWorker' makes sense 'in-recovery' only, I think explicit check for 'RecoveryInProgress' is not needed here. But I can add if others also think it is needed. thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Jan 24, 2024 at 2:38 PM Bertrand Drouvot wrote: > > Hi, > > On Wed, Jan 24, 2024 at 01:51:54PM +0530, Amit Kapila wrote: > > On Wed, Jan 24, 2024 at 11:24 AM Masahiko Sawada > > wrote: > > > > > > On Wed, Jan 24, 2024 at 2:43 PM Amit Kapila > > > wrote: > > > > > > > > > > > > > > +/* GUC variable */ > > > > > +bool enable_syncslot = false; > > > > > > > > > > Is enable_syncslot a really good name? We use "enable" prefix only for > > > > > planner parameters such as enable_seqscan, and it seems to me that > > > > > "slot" is not specific. Other candidates are: > > > > > > > > > > * synchronize_replication_slots = on|off > > > > > * synchronize_failover_slots = on|off > > > > > > > > > > > > > I would prefer the second one. Would it be better to just say > > > > sync_failover_slots? > > > > > > Works for me. But if we want to extend this option for non-failover > > > slots as well in the future, synchronize_replication_slots (or > > > sync_replication_slots) seems better. We can extend it by having an > > > enum later. For example, the values can be on, off, or failover etc. > > > > > > > I see your point. Let us see if others have any suggestions on this. > > I also see Sawada-San's point and I'd vote for "sync_replication_slots". Then > for > the current feature I think "failover" and "on" should be the values to turn > the > feature on (assuming "on" would mean "all kind of supported slots"). Even if others agree and we change this GUC name to "sync_replication_slots", I feel we should keep the values as "on" and "off" currently, where "on" would mean 'sync failover slots' (docs can state that clearly). I do not think we should support sync of "all kinds of supported slots" in the first version. Maybe we can think about it for future versions. thanks Shveta
Re: Synchronizing slots from primary to standby
On Thu, Jan 25, 2024 at 9:13 AM Amit Kapila wrote: > > > 3) Removed syncing 'failover' on standby from remote_slot. The > > 'failover' field will be false for synced slots. Since we do not > > support sync to cascading standbys yet, thus failover=true was > > misleading and unused there. > > > > But what will happen after the standby is promoted? After promotion, > ideally, it should have failover enabled, so that the slots can be > synced. Also, note that corresponding subscriptions still have the > failover flag enabled. I think we should copy the 'failover' option > for the synced slots. Yes, right, missed this point earlier. I will make the change in the next version. thanks Shveta
Re: Synchronizing slots from primary to standby
On Thu, Jan 25, 2024 at 10:39 AM Peter Smith wrote: > 2. synchronize_one_slot > > + /* > + * Sanity check: Make sure that concerned WAL is received and flushed > + * before syncing slot to target lsn received from the primary server. > + * > + * This check should never pass as on the primary server, we have waited > + * for the standby's confirmation before updating the logical slot. > + */ > + latestFlushPtr = GetStandbyFlushRecPtr(NULL); > + if (remote_slot->confirmed_lsn > latestFlushPtr) > + { > + ereport(LOG, > + errmsg("skipping slot synchronization as the received slot sync" > +" LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X", > +LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), > +remote_slot->name, > +LSN_FORMAT_ARGS(latestFlushPtr))); > + > + return false; > + } > > Previously in v65 this was an elog, but now it is an ereport. But > since this is a sanity check condition that "should never pass" wasn't > the elog the more appropriate choice? We realized that this scenario can be frequently hit when the user has not set standby_slot_names on primary. And thus ereport makes more sense. But I agree that this comment is misleading. We will adjust the comment in the next version. thanks Shveta
Re: Synchronizing slots from primary to standby
On Tue, Jan 30, 2024 at 11:31 AM Amit Kapila wrote: > > In this regard, I feel we don't need to dump/restore the 'FAILOVER' > option non-binary upgrade paths similar to the 'ENABLE' option. For > binary upgrade, if the failover option is enabled, then we can enable > it using Alter Subscription SET (failover=true). Let's add one test > corresponding to this behavior in > postgresql\src\bin\pg_upgrade\t\004_subscription. Changed pg_dump behaviour as suggested and added additional test. > Additionally, we need to update the pg_dump docs for the 'failover' > option. See "When dumping logical replication subscriptions, .." [1]. > I think we also need to update the connect option docs in CREATE > SUBSCRIPTION [2]. Updated docs. > [1] - https://www.postgresql.org/docs/devel/app-pgdump.html > [2] - https://www.postgresql.org/docs/devel/sql-createsubscription.html PFA v73-0001 which addresses the above comments. Other patches will be rebased and posted after pushing this one. Thanks Hou-San for adding pg_upgrade test for failover. thanks Shveta v73-0001-Add-a-failover-option-to-subscriptions.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Feb 1, 2024 at 11:21 AM Peter Smith wrote: > > Here are some review comments for v740001. Thanks Peter for the feedback. > == > src/sgml/logicaldecoding.sgml > > 1. > + > +Replication Slot Synchronization > + > + A logical replication slot on the primary can be synchronized to the hot > + standby by enabling the failover option during slot > + creation and setting > + linkend="guc-enable-syncslot">enable_syncslot > + on the standby. For the synchronization > + to work, it is mandatory to have a physical replication slot between the > + primary and the standby, and > + linkend="guc-hot-standby-feedback">hot_standby_feedback > + must be enabled on the standby. It is also necessary to specify a valid > + dbname in the > + linkend="guc-primary-conninfo">primary_conninfo > + string, which is used for slot synchronization and is ignored > for streaming. > + > > IMO we don't need to repeat that last part ", which is used for slot > synchronization and is ignored for streaming." because that is a > detail about the primary_conninfo GUC, and the same information is > already described in that GUC section. Modified in v75. > == > > 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) # > > > - If true, the slot is enabled to be synced to the standbys. > + If true, the slot is enabled to be synced to the standbys > + so that logical replication can be resumed after failover. > > > This also should have the sentence "The default is false.", e.g. the > same as the same option in CREATE_REPLICATION_SLOT says. I have not added this. I feel the default value related details should be present in the 'CREATE' part, it is not meaningful for the "ALTER" part. ALTER does not have any defaults, it just modifies the options given by the user. > == > synchronize_one_slot > > 3. > + /* > + * Make sure that concerned WAL is received and flushed before syncing > + * slot to target lsn received from the primary server. > + * > + * This check will never pass if on the primary server, user has > + * configured standby_slot_names GUC correctly, otherwise this can hit > + * frequently. > + */ > + latestFlushPtr = GetStandbyFlushRecPtr(NULL); > + if (remote_slot->confirmed_lsn > latestFlushPtr) > > BEFORE > This check will never pass if on the primary server, user has > configured standby_slot_names GUC correctly, otherwise this can hit > frequently. > > SUGGESTION (simpler way to say the same thing?) > This will always be the case unless the standby_slot_names GUC is not > correctly configured on the primary server. It is not true. It will not hit this condition "always" but has higher chances to hit it when standby_slot_names is not configured. I think you meant 'unless the standby_slot_names GUC is correctly configured'. I feel the current comment gives clear info (less confusing) and thus I have not changed it for the time being. I can consider if I get more comments there. > 4. > + /* User created slot with the same name exists, raise ERROR. */ > > /User created/User-created/ Modified. > ~~~ > > 5. synchronize_slots, and also drop_obsolete_slots > > + /* > + * Use shared lock to prevent a conflict with > + * ReplicationSlotsDropDBSlots(), trying to drop the same slot while > + * drop-database operation. > + */ > > (same code comment is in a couple of places) > > SUGGESTION (while -> during, etc.) > > Use a shared lock to prevent conflicting with > ReplicationSlotsDropDBSlots() trying to drop the same slot during a > drop-database operation. Modified. > ~~~ > > 6. validate_parameters_and_get_dbname > > strcmp() just for the empty string "" might be overkill. > > 6a. > + if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0) > > SUGGESTION > if (PrimarySlotName == NULL || *PrimarySlotName == '\0') > > ~~ > > 6b. > + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) > > SUGGESTION > if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0') Modified. thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Jan 31, 2024 at 2:02 PM Masahiko Sawada wrote: > > --- > +static char * > +wait_for_valid_params_and_get_dbname(void) > +{ > + char *dbname; > + int rc; > + > + /* Sanity check. */ > + Assert(enable_syncslot); > + > + for (;;) > + { > + if (validate_parameters_and_get_dbname(&dbname)) > + break; > + ereport(LOG, errmsg("skipping slot synchronization")); > + > + ProcessSlotSyncInterrupts(NULL); > > When reading this function, I expected that the slotsync worker would > resume working once the parameters became valid, but it was not > correct. For example, if I changed hot_standby_feedback from off to > on, the slotsync worker reads the config file, exits, and then > restarts. Given that the slotsync worker ends up exiting on parameter > changes anyway, why do we want to have it wait for parameters to > become valid? IIUC even if the slotsync worker exits when a parameter > is not valid, it restarts at some intervals. Thanks for the feedback Changed this functionality in v75. Now we do not exit in wait_for_valid_params_and_get_dbname() on GUC change. We re-validate the new values and if found valid, carry on with slot-syncing else continue waiting. > --- > +bool > +SlotSyncWorkerCanRestart(void) > +{ > +#define SLOTSYNC_RESTART_INTERVAL_SEC 10 > + > > IIUC depending on how busy the postmaster is and the timing, the user > could wait for 1 min to re-launch the slotsync worker. But I think the > user might want to re-launch the slotsync worker more quickly for > example when the slotsync worker restarts due to parameter changes. > IIUC SloSyncWorkerCanRestart() doesn't consider the fact that the > slotsync worker previously exited with 0 or 1. Modified this in v75. As you suggested in [1], we reset last_start_time on GUC change before proc_exit, so that the postmaster restarts worker immediately without waiting. > --- > + /* We are a normal standby */ > + valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull)); > + Assert(!isnull); > > What do you mean by "normal standby"? > > --- > + appendStringInfo(&cmd, > +"SELECT pg_is_in_recovery(), count(*) = 1" > +" FROM pg_replication_slots" > +" WHERE slot_type='physical' AND slot_name=%s", > +quote_literal_cstr(PrimarySlotName)); > > I think we need to make "pg_replication_slots" schema-qualified. Modified. > --- > + errdetail("The primary server slot \"%s\" specified by" > + " \"%s\" is not valid.", > + PrimarySlotName, "primary_slot_name")); > > and > > + errmsg("slot sync worker will shutdown because" > + " %s is disabled", "enable_syncslot")); > > It's better to write it in one line for better greppability. Modified. [1]: https://www.postgresql.org/message-id/CAD21AoAv6FwZ6UPNTj6%3D7A%2B3O2m4utzfL8ZGS6X1EGexikG66A%40mail.gmail.com thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Feb 2, 2024 at 12:25 PM Peter Smith wrote: > > Here are some review comments for v750002. Thanks for the feedback Peter. Addressed all in v76 except one. > (this is a WIP but this is what I found so far...) > I wonder if it is better to log all the problems in one go instead of > making users stumble onto them one at a time after fixing one and then > hitting the next problem. e.g. just set some variable "all_ok = > false;" each time instead of all the "return false;" > > Then at the end of the function just "return all_ok;" If we do this way, then we need to find a way to combine the msgs as well, otherwise the same msg will be repeated multiple times. For the concerned functionality (which needs one time config effort by user), I feel the existing way looks okay. We may consider optimizing it if we get more comments here. thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Feb 5, 2024 at 4:36 PM Amit Kapila wrote: > > I have pushed the first patch. Next, a few comments on 0002 are as follows: Thanks for the feedback Amit. Some of these are addressed in v78. Rest will be addressed in the next version. > 1. > +static bool > +validate_parameters_and_get_dbname(char **dbname, int elevel) > > For 0002, we don't need dbname as out parameter. Also, we can rename > the function to validate_slotsync_params() or something like that. > Also, for 0003, we don't need to get the dbname from > wait_for_valid_params_and_get_dbname(), instead there could be a > common function that can be invoked from validate_slotsync_params() > and caller of wait function that caches the value of dbname. > > The other parameter elevel is also not required for 0002. > > 2. > + /* > + * Make sure that concerned WAL is received and flushed before syncing > + * slot to target lsn received from the primary server. > + */ > + latestFlushPtr = GetStandbyFlushRecPtr(NULL); > + if (remote_slot->confirmed_lsn > latestFlushPtr) > + { > + /* > + * Can get here only if GUC 'standby_slot_names' on the primary server > + * was not configured correctly. > + */ > + ereport(LOG, > + errmsg("skipping slot synchronization as the received slot sync" > +" LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X", > +LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), > +remote_slot->name, > +LSN_FORMAT_ARGS(latestFlushPtr))); > + > + return false; > > In the case of a function invocation, this should be an ERROR. We can > move the comment related to 'standby_slot_names' to a later patch > where that GUC is introduced. See, if there are other LOGs in the > patch that needs to be converted to ERROR. > > 3. The function pg_sync_replication_slots() should be in file > slotfuncs.c and common functionality between this function and > slotsync worker can be exposed via a function in slotsync.c. > > 4. > /* > + * Using the specified primary server connection, check whether we are > + * cascading standby and validates primary_slot_name for > + * non-cascading-standbys. > + */ > + check_primary_info(wrconn, &am_cascading_standby, > +&primary_slot_invalid, ERROR); > + > + if (am_cascading_standby) > + ereport(ERROR, > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot synchronize replication slots to a cascading standby")); > > primary_slot_invalid is not used in this patch. I think we can allow > the function can be executed on cascading_standby as well because this > will be used for the planned switchover. > > 5. I don't see any problem with allowing concurrent processes trying > to sync the same slot at the same time as each process will acquire > the slot and only one process can acquire the slot at a time, the > other will get an ERROR. > > -- > With Regards, > Amit Kapila.
Re: Synchronizing slots from primary to standby
On Tue, Feb 6, 2024 at 7:21 PM Zhijie Hou (Fujitsu) wrote: > > > --- > > +/* Slot sync worker objects */ > > +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char > > +*PrimarySlotName; > > > > These two variables are declared also in xlogrecovery.h. Is it intentional? > > If so, I > > think it's better to write comments. > > Will address. Added comments in v79_2. > > > > > --- > > + SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid || > > '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname > > > > and > > > > + SELECT (CASE WHEN r.srsubstate = 'f' THEN > > pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || > > r.srrelid), false) > > > > If we use CONCAT function, we can replace '||' with ','. > > Modified in v79_2. thanks Shveta
Re: Synchronizing slots from primary to standby
On Tue, Feb 6, 2024 at 12:25 PM Bertrand Drouvot wrote: > > Hi, > That said, I still think the commit message needs some re-wording, what about? > > = > If a logical slot on the primary is valid but is invalidated on the standby, > then that slot is dropped and can be recreated on the standby in next > pg_sync_replication_slots() call provided the slot still exists on the primary > server. It is okay to recreate such slots as long as these are not consumable > on the standby (which is the case currently). This situation may occur due to > the following reasons: > > - The max_slot_wal_keep_size on the standby is insufficient to retain WAL > records from the restart_lsn of the slot. > - primary_slot_name is temporarily reset to null and the physical slot is > removed. > > Changing the primary wal_level to a level lower than logical is only possible > if the logical slots are removed on the primary, so it's expected to see > the slots being removed on the standby too (and re-created if they are > re-created on the primary). > = Thanks for the feedback. I have incorporated the suggestions in v80. thanks Shveta
Re: Synchronizing slots from primary to standby
On Thu, Feb 8, 2024 at 12:08 PM Peter Smith wrote: > > Here are some review comments for patch v80_2-0001. Thanks for the feedback Peter. Addressed the comments in v81. Attached patch001 for early feedback. Rest of the patches need rebasing and thus will post those later. It also addresses comments by Amit in [1]. [1]: https://www.postgresql.org/message-id/CAA4eK1Ldhh_kf-qG-m5BKY0R1SkdBSx5j%2BEzwpie%2BH9GPWWOYA%40mail.gmail.com thanks Shveta v81-0001-Add-a-slot-synchronization-function.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Feb 8, 2024 at 4:03 PM Amit Kapila wrote: > > Few comments on 0001 > === Thanks Amit. Addressed these in v81. > 1. > + * the slots on the standby and synchronize them. This is done on every call > + * to SQL function pg_sync_replication_slots. > > > > I think the second sentence can be slightly changed to: "This is done > by a call to SQL function pg_sync_replication_slots." or "One can call > SQL function pg_sync_replication_slots to invoke this functionality." Done. > 2. > +update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid) > { > ... > + SpinLockAcquire(&slot->mutex); > + slot->data.plugin = plugin_name; > + slot->data.database = remote_dbid; > + slot->data.two_phase = remote_slot->two_phase; > + slot->data.failover = remote_slot->failover; > + slot->data.restart_lsn = remote_slot->restart_lsn; > + slot->data.confirmed_flush = remote_slot->confirmed_lsn; > + slot->data.catalog_xmin = remote_slot->catalog_xmin; > + slot->effective_catalog_xmin = remote_slot->catalog_xmin; > + SpinLockRelease(&slot->mutex); > + > + if (remote_slot->catalog_xmin != slot->data.catalog_xmin) > + ReplicationSlotsComputeRequiredXmin(false); > + > + if (remote_slot->restart_lsn != slot->data.restart_lsn) > + ReplicationSlotsComputeRequiredLSN(); > ... > } > > How is it possible that after assigning the values from remote_slot > they can differ from local slot values? It was a mistake while comment fixing in previous versions. Corrected it now. Thanks for catching. > 3. > + /* > + * Find the oldest existing WAL segment file. > + * > + * Normally, we can determine it by using the last removed segment > + * number. However, if no WAL segment files have been removed by a > + * checkpoint since startup, we need to search for the oldest segment > + * file currently existing in XLOGDIR. > + */ > + oldest_segno = XLogGetLastRemovedSegno() + 1; > + > + if (oldest_segno == 1) > + oldest_segno = XLogGetOldestSegno(0); > > I feel this way isn't there a risk that XLogGetOldestSegno() will get > us the seg number from some previous timeline which won't make sense > to compare segno in reserve_wal_for_local_slot. Shouldn't you need to > fetch the current timeline and send as a parameter to this function as > that is the timeline on which standby is communicating with primary. Yes, modified it. > 4. > + if (remote_slot->confirmed_lsn > latestFlushPtr) > + ereport(ERROR, > + errmsg("skipping slot synchronization as the received slot sync" > > I think the internal errors should be reported with elog as you have > done at other palces in the patch. Done. > 5. > +synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) > { > ... > + /* > + * Copy the invalidation cause from remote only if local slot is not > + * invalidated locally, we don't want to overwrite existing one. > + */ > + if (slot->data.invalidated == RS_INVAL_NONE) > + { > + SpinLockAcquire(&slot->mutex); > + slot->data.invalidated = remote_slot->invalidated; > + SpinLockRelease(&slot->mutex); > + > + /* Make sure the invalidated state persists across server restart */ > + ReplicationSlotMarkDirty(); > + ReplicationSlotSave(); > + slot_updated = true; > + } > ... > } > > Do we need to copy the 'invalidated' from remote to local if both are > same? I think this will happen for each slot each time because > normally slots won't be invalidated ones, so there is needless writes. It is not needed everytime. Optimized it. Now we copy only if local_slot's 'invalidated' value is RS_INVAL_NONE while remote-slot's value != RS_INVAL_NONE. > 6. > + * Returns TRUE if any of the slots gets updated in this sync-cycle. > + */ > +static bool > +synchronize_slots(WalReceiverConn *wrconn) > ... > ... > > +void > +SyncReplicationSlots(WalReceiverConn *wrconn) > +{ > + PG_TRY(); > + { > + validate_primary_slot_name(wrconn); > + > + (void) synchronize_slots(wrconn); > > For the purpose of 0001, synchronize_slots() doesn't seems to use > return value. So, I suggest to change it accordingly and move the > return value in the required patch. Modified it. Also changed return values of all related internal functions which were returning slot_updated. > 7. > + /* > + * The primary_slot_name is not set yet or WALs not received yet. > + * Synchronization is not possible if the walreceiver is not started. > + */ > + latestWalEnd = GetWalRcvLatestWalEnd(); > + SpinLockAcquire(&WalRcv->mutex); > + if ((WalRcv->slotname[0] == '\0') || > + XLogRecPtrIsInvalid(latestWalEnd)) > + { > + SpinLockRelease(&WalRcv->mutex); > + return false; > > For the purpose of 0001, we should give WARNING here. I will fix it in the next version. Sorry, I somehow missed it this time. thanks Shveta
Re: Synchronizing slots from primary to standby
On Thu, Feb 8, 2024 at 4:31 PM shveta malik wrote: > > On Thu, Feb 8, 2024 at 12:08 PM Peter Smith wrote: > > > > Here are some review comments for patch v80_2-0001. > > Thanks for the feedback Peter. Addressed the comments in v81. Missed to mention, Hou-san helped in addressing some of these comments in v81.Thanks Hou-San. thanks Shveta
Re: Have pg_basebackup write "dbname" in "primary_conninfo"?
On Thu, Mar 14, 2024 at 10:57 AM Amit Kapila wrote: > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada wrote: > > > > This fact makes me think that the slotsync worker might be able to > > accept the primary_conninfo value even if there is no dbname in the > > value. That is, if there is no dbname in the primary_conninfo, it uses > > the username in accordance with the specs of the connection string. > > Currently, the slotsync worker connects to the local database first > > and then establishes the connection to the primary server. But if we > > can reverse the two steps, it can get the dbname that has actually > > been used to establish the remote connection and use it for the local > > connection too. That way, the primary_conninfo generated by > > pg_basebackup could work even without the patch. For example, if the > > OS user executing pg_basebackup is 'postgres', the slotsync worker > > would connect to the postgres database. Given the 'postgres' database > > is created by default and 'postgres' OS user is used in common, I > > guess it could cover many cases in practice actually. > > > > I think this is worth investigating but I suspect that in most cases > users will end up using a replication connection without specifying > the user name and we may not be able to give a meaningful error > message when slotsync worker won't be able to connect. The same will > be true even when the dbname same as the username would be used. > I attempted the change as suggested by Swada-San. Attached the PoC patch .For it to work, I have to expose a new get api in libpq-fe which gets dbname from stream-connection. Please have a look. Without this PoC patch, the errors in slot-sync worker: - a) If dbname is missing: [1230932] LOG: slot sync worker started [1230932] ERROR: slot synchronization requires dbname to be specified in primary_conninfo b) If specified db does not exist [1230913] LOG: slot sync worker started [1230913] FATAL: database "postgres1" does not exist - Now with this patch: - a) If the dbname same as user does not exist: [1232473] LOG: slot sync worker started [1232473] ERROR: could not connect to the primary server: connection to server at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not exist b) If user itself is removed from primary_conninfo, libpq takes user who has authenticated the system by default and gives error if db of same name does not exist ERROR: could not connect to the primary server: connection to server at "127.0.0.1", port 5433 failed: FATAL: database "shveta" does not exist - The errors in second case look slightly confusing to me. thanks Shveta v1-0001-Use-user-as-dbname-for-slot-sync.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy wrote: > > On Thu, Mar 14, 2024 at 12:24 PM Amit Kapila wrote: > > > > On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy > > > > > > Yes, there will be some sort of duplicity if we emit conflict_reason > > > as a text field. However, I still think the better way is to turn > > > conflict_reason text to conflict boolean and set it to true only on > > > rows_removed and wal_level_insufficient invalidations. When conflict > > > boolean is true, one (including all the tests that we've added > > > recently) can look for invalidation_reason text field for the reason. > > > This sounds reasonable to me as opposed to we just mentioning in the > > > docs that "if invalidation_reason is rows_removed or > > > wal_level_insufficient it's the reason for conflict with recovery". +1 on maintaining both conflicting and invalidation_reason > > Fair point. I think we can go either way. Bertrand, Nathan, and > > others, do you have an opinion on this matter? > > While we wait to hear from others on this, I'm attaching the v9 patch > set implementing the above idea (check 0001 patch). Please have a > look. I'll come back to the other review comments soon. Thanks for the patch. JFYI, patch09 does not apply to HEAD, some recent commit caused the conflict. Some trivial comments on patch001 (yet to review other patches) 1) info.c: - "%s as caught_up, conflict_reason IS NOT NULL as invalid " + "%s as caught_up, invalidation_reason IS NOT NULL as invalid " Can we revert back to 'conflicting as invalid' since it is a query for logical slots only. 2) 040_standby_failover_slots_sync.pl: - q{SELECT conflict_reason IS NULL AND synced AND NOT temporary FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';} + q{SELECT invalidation_reason IS NULL AND synced AND NOT temporary FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';} Here too, can we have 'NOT conflicting' instead of ' invalidation_reason IS NULL' as it is a logical slot test. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy wrote: > > While we wait to hear from others on this, I'm attaching the v9 patch > set implementing the above idea (check 0001 patch). Please have a > look. I'll come back to the other review comments soon. > patch002: 1) I would like to understand the purpose of 'inactive_count'? Is it only for users for monitoring purposes? We are not using it anywhere internally. I shutdown the instance 5 times and found that 'inactive_count' became 5 for all the slots created on that instance. Is this intentional? I mean we can not really use them if the instance is down. I felt it should increment the inactive_count only if during the span of instance, they were actually inactive i.e. no streaming or replication happening through them. 2) slot.c: + case RS_INVAL_XID_AGE: + { + if (TransactionIdIsNormal(s->data.xmin)) + { + .. + } + if (TransactionIdIsNormal(s->data.catalog_xmin)) + { + .. + } + } Can we optimize this code? It has duplicate code for processing s->data.catalog_xmin and s->data.xmin. Can we create a sub-function for this purpose and call it twice here? thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy wrote: > > I've attached the v18 patch set here. Thanks for the patches. Please find few comments: patch 001: 1) slot.h: + /* The time at which this slot become inactive */ + TimestampTz last_inactive_time; become -->became - patch 002: 2) slotsync.c: ReplicationSlotCreate(remote_slot->name, true, RS_TEMPORARY, remote_slot->two_phase, remote_slot->failover, - true); + true, 0); + slot->data.inactive_timeout = remote_slot->inactive_timeout; Is there a reason we are not passing 'remote_slot->inactive_timeout' to ReplicationSlotCreate() directly? - 3) slotfuncs.c pg_create_logical_replication_slot(): + int inactive_timeout = PG_GETARG_INT32(5); Can we mention here that timeout is in seconds either in comment or rename variable to inactive_timeout_secs? Please do this for create_physical_replication_slot(), create_logical_replication_slot(), pg_create_physical_replication_slot() as well. - 4) + int inactive_timeout; /* The amount of time in seconds the slot + * is allowed to be inactive. */ } LogicalSlotInfo; Do we need to mention "before getting invalided" like other places (in last patch)? -- 5) Same at these two places. "before getting invalided" to be added in the last patch otherwise the info is incompleted. + + /* The amount of time in seconds the slot is allowed to be inactive */ + int inactive_timeout; } ReplicationSlotPersistentData; + * inactive_timeout: The amount of time in seconds the slot is allowed to be + * inactive. */ void ReplicationSlotCreate(const char *name, bool db_specific, Same here. "before getting invalidated" ? Reviewing more.. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 10:33 AM shveta malik wrote: > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy > wrote: > > > > I've attached the v18 patch set here. > I have a question. Don't we allow creating subscriptions on an existing slot with a non-null 'inactive_timeout' set where 'inactive_timeout' of the slot is retained even after subscription creation? I tried this: === --On publisher, create slot with 120sec inactive_timeout: SELECT * FROM pg_create_logical_replication_slot('logical_slot1', 'pgoutput', false, true, true, 120); --On subscriber, create sub using logical_slot1 create subscription mysubnew1_1 connection 'dbname=newdb1 host=localhost user=shveta port=5433' publication mypubnew1_1 WITH (failover = true, create_slot=false, slot_name='logical_slot1'); --Before creating sub, pg_replication_slots output: slot_name | failover | synced | active | temp | conf | lat| inactive_timeout ---+--+++--+--+--+-- logical_slot1 | t| f | f | f| f| 2024-03-25 11:11:55.375736+05:30 | 120 --After creating sub pg_replication_slots output: (inactive_timeout is 0 now): slot_name |failover | synced | active | temp | conf | | lat | inactive_timeout ---+-+++--+--+-+-+-- logical_slot1 |t| f | t | f| f| | | 0 === In CreateSubscription, we call 'walrcv_alter_slot()' / 'ReplicationSlotAlter()' when create_slot is false. This call ends up setting active_timeout from 120sec to 0. Is it intentional? thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 11:53 AM shveta malik wrote: > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik wrote: > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy > > wrote: > > > > > > I've attached the v18 patch set here. I have one concern, for synced slots on standby, how do we disallow invalidation due to inactive-timeout immediately after promotion? For synced slots, last_inactive_time and inactive_timeout are both set. Let's say I bring down primary for promotion of standby and then promote standby, there are chances that it may end up invalidating synced slots (considering standby is not brought down during promotion and thus inactive_timeout may already be past 'last_inactive_time'). I tried with smaller unit of inactive_timeout: --Shutdown primary to prepare for planned promotion. --On standby, one synced slot with last_inactive_time (lat) as 12:21 slot_name | failover | synced | active | temp | conf | res | lat| inactive_timeout ---+--+++--+--+-+--+-- logical_slot1 | t | t | f | f | f | | 2024-03-25 12:21:09.020757+05:30 | 60 --wait for some time, now the time is 12:24 postgres=# select now(); now -- 2024-03-25 12:24:17.616716+05:30 -- promote immediately: ./pg_ctl -D ../../standbydb/ promote -w --on promoted standby: postgres=# select pg_is_in_recovery(); pg_is_in_recovery --- f --synced slot is invalidated immediately on promotion. slot_name | failover | synced | active | temp | conf | res| lat| inactive_timeout ---+--+++--+--+--+--+ logical_slot1 | t | t | f | f | f| inactive_timeout | 2024-03-25 12:21:09.020757+05:30 | thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot wrote: > > Hi, > > On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote: > > On Mon, Mar 25, 2024 at 12:43 PM shveta malik > > wrote: > > > > > > On Mon, Mar 25, 2024 at 11:53 AM shveta malik > > > wrote: > > > > > > > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik > > > > wrote: > > > > > > > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy > > > > > wrote: > > > > > > > > > > > > I've attached the v18 patch set here. > > > > > > I have one concern, for synced slots on standby, how do we disallow > > > invalidation due to inactive-timeout immediately after promotion? > > > > > > For synced slots, last_inactive_time and inactive_timeout are both > > > set. > > Yeah, and I can see last_inactive_time is moving on the standby (while not the > case on the primary), probably due to the sync worker slot acquisition/release > which does not seem right. > > > Let's say I bring down primary for promotion of standby and then > > > promote standby, there are chances that it may end up invalidating > > > synced slots (considering standby is not brought down during promotion > > > and thus inactive_timeout may already be past 'last_inactive_time'). > > > > > > > This raises the question of whether we need to set > > 'last_inactive_time' synced slots on the standby? > > Yeah, I think that last_inactive_time should stay at 0 on synced slots on the > standby because such slots are not usable anyway (until the standby gets > promoted). > > So, I think that last_inactive_time does not make sense if the slot never had > the chance to be active. > > OTOH I think the timeout invalidation (if any) should be synced from primary. Yes, even I feel that last_inactive_time makes sense only when the slot is available to be used. Synced slots are not available to be used until standby is promoted and thus last_inactive_time can be skipped to be set for synced_slots. But once primay is invalidated due to inactive-timeout, that invalidation should be synced to standby (which is happening currently). thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot wrote: > > Hi, > > Yeah, and I can see last_inactive_time is moving on the standby (while not the > case on the primary), probably due to the sync worker slot acquisition/release > which does not seem right. > Yes, you are right, last_inactive_time keeps on moving for synced slots on standby. Once I disabled slot-sync worker, then it is constant. Then it only changes if I call pg_sync_replication_slots(). On a different note, I noticed that we allow altering inactive_timeout for synced-slots on standby. And again overwrite it with the primary's value in the next sync cycle. Steps: --Check pg_replication_slots for synced slot on standby, inactive_timeout is 120 slot_name | failover | synced | active | inactive_timeout ---+--+++-- logical_slot1 | t| t | f | 120 --Alter on standby SELECT 'alter' FROM pg_alter_replication_slot('logical_slot1', 900); --Check pg_replication_slots: slot_name | failover | synced | active | inactive_timeout ---+--+++-- logical_slot1 | t| t | f | 900 --Run sync function SELECT pg_sync_replication_slots(); --check again, inactive_timeout is set back to primary's value. slot_name | failover | synced | active | inactive_timeout ---+--+++-- logical_slot1 | t| t | f | 120 I feel altering synced slot's inactive_timeout should be prohibited on standby. It should be in sync with primary always. Thoughts? I am listing the concerns raised by me: 1) create-subscription with create_slot=false overwriting inactive_timeout of existing slot ([1]) 2) last_inactive_time set for synced slots may result in invalidation of slot on promotion. ([2]) 3) alter replication slot to alter inactive_timout for synced slots on standby, should this be allowed? [1]: https://www.postgresql.org/message-id/CAJpy0uAqBi%2BGbNn2ngJ-A_Z905CD3ss896bqY2ACUjGiF1Gkng%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CAJpy0uCLu%2BmqAwAMum%3DpXE9YYsy0BE7hOSw_Wno5vjwpFY%3D63g%40mail.gmail.com thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 3:31 PM Bharath Rupireddy wrote: > > Right. Done that way i.e. not setting the last_inactive_time for slots > both while releasing the slot and restoring from the disk. > > Also, I've added a TAP function to check if the captured times are > sane per Bertrand's review comment. > > Please see the attached v20 patch. Thanks for the patch. The issue of unnecessary invalidation of synced slots on promotion is resolved in this patch. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 5:10 PM Amit Kapila wrote: > > I think we should keep pg_alter_replication_slot() as the last > priority among the remaining patches for this release. Let's try to > first finish the primary functionality of inactive_timeout patch. > Otherwise, I agree that the problem reported by you should be fixed. Noted. Will focus on v18-002 patch now. I was debugging the flow and just noticed that RecoveryInProgress() always returns 'true' during StartupReplicationSlots()-->RestoreSlotFromDisk() (even on primary) as 'xlogctl->SharedRecoveryState' is always 'RECOVERY_STATE_CRASH' at that time. The 'xlogctl->SharedRecoveryState' is changed to 'RECOVERY_STATE_DONE' on primary and to 'RECOVERY_STATE_ARCHIVE' on standby at a later stage in StartupXLOG() (after we are done loading slots). The impact of this is, the condition in RestoreSlotFromDisk() in v20-001: if (!(RecoveryInProgress() && slot->data.synced)) slot->last_inactive_time = GetCurrentTimestamp(); is merely equivalent to: if (!slot->data.synced) slot->last_inactive_time = GetCurrentTimestamp(); Thus on primary, after restart, last_inactive_at is set correctly, while on promoted standby (new primary), last_inactive_at is always NULL after restart for the synced slots. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 25, 2024 at 12:43 PM shveta malik wrote: > > I have one concern, for synced slots on standby, how do we disallow > invalidation due to inactive-timeout immediately after promotion? > > For synced slots, last_inactive_time and inactive_timeout are both > set. Let's say I bring down primary for promotion of standby and then > promote standby, there are chances that it may end up invalidating > synced slots (considering standby is not brought down during promotion > and thus inactive_timeout may already be past 'last_inactive_time'). > On standby, if we decide to maintain valid last_inactive_time for synced slots, then invalidation is correctly restricted in InvalidateSlotForInactiveTimeout() for synced slots using the check: if (RecoveryInProgress() && slot->data.synced) return false; But immediately after promotion, we can not rely on the above check and thus possibility of synced slots invalidation is there. To maintain consistent behavior regarding the setting of last_inactive_time for synced slots, similar to user slots, one potential solution to prevent this invalidation issue is to update the last_inactive_time of all synced slots within the ShutDownSlotSync() function during FinishWalRecovery(). This approach ensures that promotion doesn't immediately invalidate slots, and henceforth, we possess a correct last_inactive_time as a basis for invalidation going forward. This will be equivalent to updating last_inactive_time during restart (but without actual restart during promotion). The plus point of maintaining last_inactive_time for synced slots could be, this can provide data to the user on when last time the sync was attempted on that particular slot by background slot sync worker or SQl function. Thoughts? thanks Shveta
Re: pgsql: Track last_inactive_time in pg_replication_slots.
On Mon, Mar 25, 2024 at 9:54 PM Bertrand Drouvot wrote: > > Hi, > > On Mon, Mar 25, 2024 at 07:32:11PM +0530, Amit Kapila wrote: > > On Mon, Mar 25, 2024 at 6:57 PM Robert Haas wrote: > > > And I'm suspicious that having an exception for slots being synced is > > > a bad idea. That makes too much of a judgement about how the user will > > > use this field. It's usually better to just expose the data, and if > > > the user needs helps to make sense of that data, then give them that > > > help separately. > > > > The reason we didn't set this for sync slots is that they won't be > > usable (one can't use them to decode WAL) unless standby is promoted > > [2]. But I see your point as well. So, I have copied the others > > involved in this discussion to see what they think. > > Yeah I also see Robert's point. If we also sync the "last inactive time" > field then > we would need to take care of the corner case mentioned by Shveta in [1] > during > promotion. I have suggested one potential solution for that in [1]. Please have a look. [1]: https://www.postgresql.org/message-id/CAJpy0uB-yE%2BRiw7JQ4hW0%2BigJxvPc%2Brq%2B9c7WyTa1Jz7%2B2gAiA%40mail.gmail.com thanks Shveta
Re: pgsql: Track last_inactive_time in pg_replication_slots.
On Tue, Mar 26, 2024 at 1:50 AM Bharath Rupireddy wrote: > > On Tue, Mar 26, 2024 at 1:30 AM Nathan Bossart > wrote: > > > > On Mon, Mar 25, 2024 at 04:49:12PM +, Bertrand Drouvot wrote: > > > On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote: > > >> In the same vein, I think deactivated_at or inactive_since might be > > >> good names to consider. I think they get at the same thing as > > >> released_time, but they avoid introducing a completely new word > > >> (release, as opposed to active/inactive). > > > > > > Yeah, I'd vote for inactive_since then. > > > > Having only skimmed some of the related discussions, I'm inclined to agree > > that inactive_since provides the clearest description for the column. > > I think we all have some agreement on inactive_since. So, I'm > attaching the patch for that change. pg_proc.dat needs to be changed to refer to 'inactive_since' instead of 'last_inactive_time' in the attached patch. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 11:36 AM Bertrand Drouvot wrote: > > > > The issue that I can see with your proposal is: what if one synced the slots > > manually (with pg_sync_replication_slots()) but does not use the sync > > worker? > > Then I think ShutDownSlotSync() is not going to help in that case. > > It looks like ShutDownSlotSync() is always called (even if > sync_replication_slots = off), > so that sounds ok to me (I should have checked the code, I was under the > impression > ShutDownSlotSync() was not called if sync_replication_slots = off). Right, it is called irrespective of sync_replication_slots. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 11:08 AM Bharath Rupireddy wrote: > > On Tue, Mar 26, 2024 at 9:30 AM shveta malik wrote: > > > > On Mon, Mar 25, 2024 at 12:43 PM shveta malik > > wrote: > > > > > > I have one concern, for synced slots on standby, how do we disallow > > > invalidation due to inactive-timeout immediately after promotion? > > > > > > For synced slots, last_inactive_time and inactive_timeout are both > > > set. Let's say I bring down primary for promotion of standby and then > > > promote standby, there are chances that it may end up invalidating > > > synced slots (considering standby is not brought down during promotion > > > and thus inactive_timeout may already be past 'last_inactive_time'). > > > > > > > On standby, if we decide to maintain valid last_inactive_time for > > synced slots, then invalidation is correctly restricted in > > InvalidateSlotForInactiveTimeout() for synced slots using the check: > > > > if (RecoveryInProgress() && slot->data.synced) > > return false; > > > > But immediately after promotion, we can not rely on the above check > > and thus possibility of synced slots invalidation is there. To > > maintain consistent behavior regarding the setting of > > last_inactive_time for synced slots, similar to user slots, one > > potential solution to prevent this invalidation issue is to update the > > last_inactive_time of all synced slots within the ShutDownSlotSync() > > function during FinishWalRecovery(). This approach ensures that > > promotion doesn't immediately invalidate slots, and henceforth, we > > possess a correct last_inactive_time as a basis for invalidation going > > forward. This will be equivalent to updating last_inactive_time during > > restart (but without actual restart during promotion). > > The plus point of maintaining last_inactive_time for synced slots > > could be, this can provide data to the user on when last time the sync > > was attempted on that particular slot by background slot sync worker > > or SQl function. Thoughts? > > Please find the attached v21 patch implementing the above idea. It > also has changes for renaming last_inactive_time to inactive_since. > Thanks for the patch. I have tested this patch alone, and it does what it says. One additional thing which I noticed is that now it sets inactive_since for temp slots as well, but that idea looks fine to me. I could not test 'invalidation on promotion bug' with this change, as that needed rebasing of the rest of the patches. Few trivial things: 1) Commti msg: ensures the value is set to current timestamp during the shutdown to help correctly interpret the time if the standby gets promoted without a restart. shutdown --> shutdown of slot sync worker (as it was not clear if it is instance shutdown or something else) 2) 'The time since the slot has became inactive'. has became-->has become or just became Please check it in all the files. There are multiple places. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 2:27 PM Bharath Rupireddy wrote: > > > > > 1) > > Commti msg: > > > > ensures the value is set to current timestamp during the > > shutdown to help correctly interpret the time if the standby gets > > promoted without a restart. > > > > shutdown --> shutdown of slot sync worker (as it was not clear if it > > is instance shutdown or something else) > > Changed it to "shutdown of slot sync machinery" to be consistent with > the comments. Thanks for addressing the comments. Just to give more clarity here (so that you take a informed decision), I am not sure if we actually shut down slot-sync machinery. We only shot down slot sync worker. Slot-sync machinery can still be used using 'pg_sync_replication_slots' SQL function. I can easily reproduce the scenario where SQL function and reset_synced_slots_info() are going in parallel where the latter hits 'Assert(s->active_pid == 0)' due to the fact that parallel SQL sync function is active on that slot. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 1:54 PM Bertrand Drouvot wrote: > > Hi, > > On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote: > > On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot > > wrote: > > > > > > 2 === > > > > > > It looks like inactive_since is set to the current timestamp on the > > > standby > > > each time the sync worker does a cycle: > > > > > > primary: > > > > > > postgres=# select slot_name,inactive_since from pg_replication_slots > > > where failover = 't'; > > > slot_name |inactive_since > > > -+--- > > > lsub27_slot | 2024-03-26 07:39:19.745517+00 > > > lsub28_slot | 2024-03-26 07:40:24.953826+00 > > > > > > standby: > > > > > > postgres=# select slot_name,inactive_since from pg_replication_slots > > > where failover = 't'; > > > slot_name |inactive_since > > > -+--- > > > lsub27_slot | 2024-03-26 07:43:56.387324+00 > > > lsub28_slot | 2024-03-26 07:43:56.387338+00 > > > > > > I don't think that should be the case. > > > > > > > But why? This is exactly what we discussed in another thread where we > > agreed to update inactive_since even for sync slots. > > Hum, I thought we agreed to "sync" it and to "update it to current time" > only at promotion time. I think there may have been some misunderstanding here. But now if I rethink this, I am fine with 'inactive_since' getting synced from primary to standby. But if we do that, we need to add docs stating "inactive_since" represents primary's inactivity and not standby's slots inactivity for synced slots. The reason for this clarification is that the synced slot might be generated much later, yet 'inactive_since' is synced from the primary, potentially indicating a time considerably earlier than when the synced slot was actually created. Another approach could be that "inactive_since" for synced slot actually gives its own inactivity data rather than giving primary's slot data. We update inactive_since on standby only at 3 occasions: 1) at the time of creation of the synced slot. 2) during standby restart. 3) during promotion of standby. I have attached a sample patch for this idea as.txt file. I am fine with any of these approaches. One gives data synced from primary for synced slots, while another gives actual inactivity data of synced slots. thanks Shveta From 7dcd0e95299263187eb1f03812f8321b2612ee5c Mon Sep 17 00:00:00 2001 From: Shveta Malik Date: Tue, 26 Mar 2024 14:42:25 +0530 Subject: [PATCH v1] inactive_since for synced slots. inactive_since is updated for synced slots: 1) at the time of creation of slot. 2) during server restart. 3) during promotion. --- src/backend/replication/logical/slotsync.c | 1 + src/backend/replication/slot.c | 15 --- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index bbf9a2c485..6114895dca 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) SpinLockAcquire(&slot->mutex); slot->effective_catalog_xmin = xmin_horizon; slot->data.catalog_xmin = xmin_horizon; + slot->inactive_since = GetCurrentTimestamp(); SpinLockRelease(&slot->mutex); ReplicationSlotsComputeRequiredXmin(true); LWLockRelease(ProcArrayLock); diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index d0a2f440ef..f2a57a14ec 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -628,7 +628,10 @@ retry: * now. */ SpinLockAcquire(&s->mutex); - s->inactive_since = 0; + + if (!(RecoveryInProgress() && s->data.synced)) + s->inactive_since = 0; + SpinLockRelease(&s->mutex); if (am_walsender) @@ -704,14 +707,20 @@ ReplicationSlotRelease(void) */ SpinLockAcquire(&slot->mutex); slot->active_pid = 0; - slot->inactive_since = now; + + if (!(RecoveryInProgress() && slot->data.synced)) + slot->inactive_since = now; + SpinLockRelease(&slot->mutex); ConditionVariableBroadcast(&slot->active_cv); } else { SpinLockAcquire(&slot->mutex); - slot->inactive_since = now; + + if (!(RecoveryInProgress() && slot->data.synced)) + slot->inactive_since = now; + SpinLockRelease(&slot->mutex); } -- 2.34.1
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 3:50 PM Bertrand Drouvot wrote: > > Hi, > > > I think there may have been some misunderstanding here. > > Indeed ;-) > > > But now if I > > rethink this, I am fine with 'inactive_since' getting synced from > > primary to standby. But if we do that, we need to add docs stating > > "inactive_since" represents primary's inactivity and not standby's > > slots inactivity for synced slots. > > Yeah sure. > > > The reason for this clarification > > is that the synced slot might be generated much later, yet > > 'inactive_since' is synced from the primary, potentially indicating a > > time considerably earlier than when the synced slot was actually > > created. > > Right. > > > Another approach could be that "inactive_since" for synced slot > > actually gives its own inactivity data rather than giving primary's > > slot data. We update inactive_since on standby only at 3 occasions: > > 1) at the time of creation of the synced slot. > > 2) during standby restart. > > 3) during promotion of standby. > > > > I have attached a sample patch for this idea as.txt file. > > Thanks! > > > I am fine with any of these approaches. One gives data synced from > > primary for synced slots, while another gives actual inactivity data > > of synced slots. > > What about another approach?: inactive_since gives data synced from primary > for > synced slots and another dedicated field (could be added later...) could > represent what you suggest as the other option. Yes, okay with me. I think there is some confusion here as well. In my second approach above, I have not suggested anything related to sync-worker. We can think on that later if we really need another field which give us sync time. In my second approach, I have tried to avoid updating inactive_since for synced slots during sync process. We update that field during creation of synced slot so that inactive_since reflects correct info even for synced slots (rather than copying from primary). Please have a look at my patch and let me know your thoughts. I am fine with copying it from primary as well and documenting this behaviour. > Another cons of updating inactive_since at the current time during each slot > sync cycle is that calling GetCurrentTimestamp() very frequently > (during each sync cycle of very active slots) could be too costly. Right. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy wrote: > > On Tue, Mar 26, 2024 at 4:18 PM shveta malik wrote: > > > > > What about another approach?: inactive_since gives data synced from > > > primary for > > > synced slots and another dedicated field (could be added later...) could > > > represent what you suggest as the other option. > > > > Yes, okay with me. I think there is some confusion here as well. In my > > second approach above, I have not suggested anything related to > > sync-worker. We can think on that later if we really need another > > field which give us sync time. In my second approach, I have tried to > > avoid updating inactive_since for synced slots during sync process. We > > update that field during creation of synced slot so that > > inactive_since reflects correct info even for synced slots (rather > > than copying from primary). Please have a look at my patch and let me > > know your thoughts. I am fine with copying it from primary as well and > > documenting this behaviour. > > I took a look at your patch. > > --- a/src/backend/replication/logical/slotsync.c > +++ b/src/backend/replication/logical/slotsync.c > @@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid > remote_dbid) > SpinLockAcquire(&slot->mutex); > slot->effective_catalog_xmin = xmin_horizon; > slot->data.catalog_xmin = xmin_horizon; > +slot->inactive_since = GetCurrentTimestamp(); > SpinLockRelease(&slot->mutex); > > If we just sync inactive_since value for synced slots while in > recovery from the primary, so be it. Why do we need to update it to > the current time when the slot is being created? If we update inactive_since at synced slot's creation or during restart (skipping setting it during sync), then this time reflects actual 'inactive_since' for that particular synced slot. Isn't that a clear info for the user and in alignment of what the name 'inactive_since' actually suggests? > We don't expose slot > creation time, no? No, we don't. But for synced slot, that is the time since that slot is inactive (unless promoted), so we are exposing inactive_since and not creation time. >Aren't we fine if we just sync the value from > primary and document that fact? After the promotion, we can reset it > to the current time so that it gets its own time. Do you see any > issues with it? Yes, we can do that. But curious to know, do we see any additional benefit of reflecting primary's inactive_since at standby which I might be missing? thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 9:59 PM Bharath Rupireddy wrote: > > On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy > wrote: > > > > If we just sync inactive_since value for synced slots while in > > recovery from the primary, so be it. Why do we need to update it to > > the current time when the slot is being created? We don't expose slot > > creation time, no? Aren't we fine if we just sync the value from > > primary and document that fact? After the promotion, we can reset it > > to the current time so that it gets its own time. > > I'm attaching v24 patches. It implements the above idea proposed > upthread for synced slots. I've now separated > s/last_inactive_time/inactive_since and synced slots behaviour. Please > have a look. Thanks for the patches. Few trivial comments for v24-002: 1) slot.c: + * data from the remote slot. We use InRecovery flag instead of + * RecoveryInProgress() as it always returns true even for normal + * server startup. a) Not clear what 'it' refers to. Better to use 'the latter' b) Is it better to mention the primary here: 'as the latter always returns true even on the primary server during startup'. 2) update_local_synced_slot(): - strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0) + strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0 && + remote_slot->inactive_since == slot->inactive_since) When this code was written initially, the intent was to do strcmp at the end (only if absolutely needed). It will be good if we maintain the same and add new checks before strcmp. 3) update_synced_slots_inactive_time(): This assert is removed, is it intentional? Assert(s->active_pid == 0); 4) 040_standby_failover_slots_sync.pl: +# Capture the inactive_since of the slot from the standby the logical failover +# slots are synced/created on the standby. The comment is unclear, something seems missing. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Mar 27, 2024 at 10:22 AM Amit Kapila wrote: > > On Wed, Mar 27, 2024 at 10:08 AM Bharath Rupireddy > wrote: > > > > On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot > > wrote: > > > > > 3) > > > update_synced_slots_inactive_time(): > > > > > > This assert is removed, is it intentional? > > > Assert(s->active_pid == 0); > > > > Yes, the slot can get acquired in the corner case when someone runs > > pg_sync_replication_slots concurrently at this time. I'm referring to > > the issue reported upthread. We don't prevent one running > > pg_sync_replication_slots in promotion/ShutDownSlotSync phase right? > > Maybe we should prevent that otherwise some of the slots are synced > > and the standby gets promoted while others are yet-to-be-synced. > > > > We should do something about it but that shouldn't be done in this > patch. We can handle it separately and then add such an assert. Agreed. Once this patch is concluded, I can fix the slot sync shutdown issue and will also add this 'assert' back. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 26, 2024 at 6:05 PM Bertrand Drouvot wrote: > > > > We can think on that later if we really need another > > field which give us sync time. > > I think that calling GetCurrentTimestamp() so frequently could be too costly, > so > I'm not sure we should. Agreed. > > In my second approach, I have tried to > > avoid updating inactive_since for synced slots during sync process. We > > update that field during creation of synced slot so that > > inactive_since reflects correct info even for synced slots (rather > > than copying from primary). > > Yeah, and I think we could create a dedicated field with this information > if we feel the need. Okay. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Mar 27, 2024 at 11:05 AM Bharath Rupireddy wrote: > > Fixed an issue in synchronize_slots where DatumGetLSN is being used in > place of DatumGetTimestampTz. Found this via CF bot member [1], not on > my dev system. > > Please find the attached v6 patch. Thanks for the patch. Few trivial things: -- 1) system-views.sgml: a) "Note that the slots" --> "Note that the slots on the standbys," --it is good to mention "standbys" as synced could be true on primary as well (promoted standby) b) If you plan to add more info which Bertrand suggested, then it will be better to make a section instead of using "Note" 2) commit msg: "The impact of this on a promoted standby inactive_since is always NULL for all synced slots even after server restart. " Sentence looks broken. - Apart from the above trivial things, v26-001 looks good to me. thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Mar 27, 2024 at 2:55 PM Bharath Rupireddy wrote: > > On Wed, Mar 27, 2024 at 11:39 AM shveta malik wrote: > > > > Thanks for the patch. Few trivial things: > > Thanks for reviewing. > > > -- > > 1) > > system-views.sgml: > > > > a) "Note that the slots" --> "Note that the slots on the standbys," > > --it is good to mention "standbys" as synced could be true on primary > > as well (promoted standby) > > Done. > > > b) If you plan to add more info which Bertrand suggested, then it will > > be better to make a section instead of using "Note" > > I added the note that Bertrand specified upthread. But, I couldn't > find an instance of adding ... within a table. Hence > with "Note that " statments just like any other notes in the > system-views.sgml. pg_replication_slot in system-vews.sgml renders as > table, so having ... may not be a great idea. > > > 2) > > commit msg: > > > > "The impact of this > > on a promoted standby inactive_since is always NULL for all > > synced slots even after server restart. > > " > > Sentence looks broken. > > - > > Reworded. > > > Apart from the above trivial things, v26-001 looks good to me. > > Please check the attached v27 patch which also has Bertrand's comment > on deduplicating the TAP function. I've now moved it to Cluster.pm. > Thanks for the patch. Regarding doc, I have few comments. +Note that the slots on the standbys that are being synced from a +primary server (whose synced field is +true), will get the +inactive_since value from the +corresponding remote slot on the primary. Also, note that for the +synced slots on the standby, after the standby starts up (i.e. after +the slots are loaded from the disk), the inactive_since will remain +zero until the next slot sync cycle. a) "inactive_since will remain zero" Since it is user exposed info and the user finds it NULL in pg_replication_slots, shall we mention NULL instead of 0? b) Since we are referring to the sync cycle here, I feel it will be good to give a link to that page. +zero until the next slot sync cycle (see + for +slot synchronization details). thanks Shveta
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Mar 27, 2024 at 9:00 PM Bharath Rupireddy wrote: > > Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the > standby for sync slots. 0002 implementing inactive timeout GUC based > invalidation mechanism. > > Please have a look. Thanks for the patches. v29-001 looks good to me. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) wrote: > > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so > that > we can analyze the possible CFbot failures easily. As suggested by Amit in [1], for the fix being discussed where we need to advance the synced slot on standby, we need to skip the dbid check in fast_forward mode in CreateDecodingContext(). We tried few tests to make sure that there was no table-access done during fast-forward mode 1) Initially we tried avoiding database-id check in CreateDecodingContext() only when called by pg_logical_replication_slot_advance(). 'make check-world' passed on HEAD for the same. 2) But the more generic solution was to skip the database check if "fast_forward" is true. It was tried and 'make check-world' passed on HEAD for that as well. 3) Another thing tried by Hou-San was to run pgbench after skipping db check in the fast_forward logical decoding case. pgbench was run to generate some changes and then the logical slot was advanced to the latest position in another database. A LOG was added in relation_open to catch table access. It was found that there was no table-access in fast forward logical decoding i.e. no LOGS for table-open were generated during the test. Steps given at [2] [1]: https://www.postgresql.org/message-id/CAA4eK1KMiKangJa4NH_K1oFc87Y01n3rnpuwYagT59Y%3DADW8Dw%40mail.gmail.com [2]: -- 1. apply the DEBUG patch (attached as .txt) which will log the relation open and table cache access. 2. create a slot: SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); 3. run pgbench to generate some data. pgbench -i postgres pgbench --aggregate-interval=5 --time=5 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 postgres 4. start a fresh session in a different db and advance the slot to the latest position. There should be no relation open or CatCache log between the LOG "starting logical decoding for slot .." and LOG "decoding over". SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); -- thanks Shveta From 5386894faa14c0de9854e0eee9679f8eea775f65 Mon Sep 17 00:00:00 2001 From: Hou Zhijie Date: Fri, 29 Mar 2024 11:46:36 +0800 Subject: [PATCH] debug log --- src/backend/access/common/relation.c | 2 ++ src/backend/replication/slotfuncs.c | 1 + src/backend/utils/cache/catcache.c | 1 + 3 files changed, 4 insertions(+) diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c index d8a313a2c9..40718fc47e 100644 --- a/src/backend/access/common/relation.c +++ b/src/backend/access/common/relation.c @@ -50,6 +50,7 @@ relation_open(Oid relationId, LOCKMODE lockmode) Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); + elog(LOG, "relation_open"); /* Get the lock before trying to open the relcache entry */ if (lockmode != NoLock) LockRelationOid(relationId, lockmode); @@ -91,6 +92,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode) Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); + elog(LOG, "try_relation_open"); /* Get the lock first */ if (lockmode != NoLock) LockRelationOid(relationId, lockmode); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index ef5081784c..564b36fc45 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -608,6 +608,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto, /* free context, call shutdown callback */ FreeDecodingContext(ctx); + elog(LOG, "decoding over"); InvalidateSystemCaches(); } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 569f51cb33..e19c586697 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -1328,6 +1328,7 @@ SearchCatCacheInternal(CatCache *cache, Assert(cache->cc_nkeys == nkeys); + elog(LOG, "SearchCatCacheInternal"); /* * one-time startup overhead for each cache */ -- 2.31.1
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > [2] The steps to reproduce the data miss issue on a primary->standby > > > setup: > > > > I'm trying to reproduce the problem with [1], but I can see the > > changes after the standby is promoted. Am I missing anything here? > > > > ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select * > > from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);" > > lsn| xid |data > > ---+-+- > > 0/3B0 | 738 | BEGIN 738 > > 0/3017FC8 | 738 | COMMIT 738 > > 0/3017FF8 | 739 | BEGIN 739 > > 0/3019A38 | 739 | COMMIT 739 > > 0/3019A38 | 740 | BEGIN 740 > > 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999 > > 0/3019AA8 | 740 | COMMIT 740 > > (7 rows) > > > > [1] > > -#define LOG_SNAPSHOT_INTERVAL_MS 15000 > > +#define LOG_SNAPSHOT_INTERVAL_MS 150 > > > > ./initdb -D db17 > > echo "archive_mode = on > > archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f' > > wal_level='logical' > > autovacuum = off > > checkpoint_timeout='1h'" | tee -a db17/postgresql.conf > > > > ./pg_ctl -D db17 -l logfile17 start > > > > rm -rf sbdata logfilesbdata > > ./pg_basebackup -D sbdata > > > > ./psql -d postgres -p 5432 -c "SELECT > > pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding', > > false, false, true);" > > ./psql -d postgres -p 5432 -c "SELECT > > pg_create_physical_replication_slot('phy_repl_slot', true, false);" > > > > echo "port=5433 > > primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu' > > primary_slot_name='phy_repl_slot' > > restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p' > > hot_standby_feedback=on > > sync_replication_slots=on" | tee -a sbdata/postgresql.conf > > > > touch sbdata/standby.signal > > > > ./pg_ctl -D sbdata -l logfilesbdata start > > ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" > > > > ./psql -d postgres > > > > SESSION1, TXN1 > > BEGIN; > > create table dummy1(a int); > > > > SESSION2, TXN2 > > SELECT pg_log_standby_snapshot(); > > > > SESSION1, TXN1 > > COMMIT; > > > > SESSION1, TXN1 > > BEGIN; > > create table dummy2(a int); > > > > SESSION2, TXN2 > > SELECT pg_log_standby_snapshot(); > > > > SESSION1, TXN1 > > COMMIT; > > > > ./psql -d postgres -p 5432 -c "SELECT > > pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());" > > > > After this step and before the next, did you ensure that the slot sync > has synced the latest confirmed_flush/restart LSNs? You can query: > "select slot_name,restart_lsn, confirmed_flush_lsn from > pg_replication_slots;" to ensure the same on both the primary and > standby. +1. To ensure last sync, one can run this manually on standby just before promotion : SELECT pg_sync_replication_slots(); thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 5:05 PM Amit Kapila wrote: > > > 2 === > > > > + { > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > + { > > > > That could call SnapBuildSnapshotExists() multiple times for the same > > "restart_lsn" (for example in case of multiple remote slots to sync). > > > > What if the sync worker records the last lsn it asks for serialization (and > > serialized ? Then we could check that value first before deciding to call > > (or not) > > SnapBuildSnapshotExists() on it? > > > > It's not ideal because it would record "only the last one" but that would be > > simple enough for now (currently there is only one sync worker so that > > scenario > > is likely to happen). > > > > Yeah, we could do that but I am not sure how much it can help. I guess > we could do some tests to see if it helps. I had a look at test-results conducted by Nisha, I did not find any repetitive restart_lsn from primary being synced to standby for that particular test of 100 slots. Unless we have some concrete test in mind (having repetitive restart_lsn), I do not think that by using the given tests, we can establish the benefit of suggested optimization. Attached the log files of all slots test for reference, thanks Shveta slot_name | database | xmin | cxmin | restart_lsn | flushlsn | wal_status | conf | failover | synced | temp ---+--+-+---+-+++--+--++-- slot_1| newdb1 | | 772 | 0/AD0 | 0/E43E9240 | reserved | f| t| f | f slot_10 | newdb2 | | 772 | 0/A0002C8 | 0/E43E9240 | reserved | f| t| f | f slot_100 | newdb20 | | 772 | 0/A005F90 | 0/E43E9240 | reserved | f| t| f | f slot_11 | newdb3 | | 772 | 0/A000300 | 0/E43E9240 | reserved | f| t| f | f slot_12 | newdb3 | | 772 | 0/A000338 | 0/E43E9240 | reserved | f| t| f | f slot_13 | newdb3 | | 772 | 0/A000370 | 0/E43E9240 | reserved | f| t| f | f slot_14 | newdb3 | | 772 | 0/A0003A8 | 0/E43E9240 | reserved | f| t| f | f slot_15 | newdb3 | | 772 | 0/A0003E0 | 0/E43E9240 | reserved | f| t| f | f slot_16 | newdb4 | | 772 | 0/A000418 | 0/E43E9240 | reserved | f| t| f | f slot_17 | newdb4 | | 772 | 0/A000450 | 0/E43E9240 | reserved | f| t| f | f slot_18 | newdb4 | | 772 | 0/A000488 | 0/E43E9240 | reserved | f| t| f | f slot_19 | newdb4 | | 772 | 0/A0004C0 | 0/E43E9240 | reserved | f| t| f | f slot_2| newdb1 | | 772 | 0/A000108 | 0/E43E9240 | reserved | f| t| f | f slot_20 | newdb4 | | 772 | 0/A0004F8 | 0/E43E9240 | reserved | f| t| f | f slot_21 | newdb5 | | 772 | 0/A000530 | 0/E43E9240 | reserved | f| t| f | f slot_22 | newdb5 | | 772 | 0/A000568 | 0/E43E9240 | reserved | f| t| f | f slot_23 | newdb5 | | 772 | 0/A0005A0 | 0/E43E9240 | reserved | f| t| f | f slot_24 | newdb5 | | 772 | 0/A0005D8 | 0/E43E9240 | reserved | f| t| f | f slot_25 | newdb5 | | 772 | 0/A000610 | 0/E43E9240 | reserved | f| t| f | f slot_26 | newdb6 | | 772 | 0/A000648 | 0/E43E9240 | reserved | f| t| f | f slot_27 | newdb6 | | 772 | 0/A000680 | 0/E43E9240 | reserved | f| t| f | f slot_28 | newdb6 | | 772 | 0/A0006B8 | 0/E43E9240 | reserved | f| t| f | f slot_29 | newdb6 | | 772 | 0/A0006F0 | 0/E43E9240 | reserved | f| t| f | f slot_3| newdb1 | | 772 | 0/A000140 | 0/E43E9240 | reserved | f| t| f | f slot_30 | newdb6 | | 772 | 0/A000728 | 0/E43E9240 | reserved | f| t| f | f slot_31 | newdb7 | | 772 | 0/A000760 | 0/E43E9240 | reserved | f| t| f | f slot_32 | newdb7 | | 772 | 0/A000798 | 0/E43E9240 | reserved | f| t| f | f slot_33 | newdb7 | | 772 | 0/A0007D0 | 0/E43E9240 | reserved | f| t| f | f slot_34 | newdb7 | | 772 | 0/A000808 | 0/E43E9240 | reserved | f| t| f | f slot_35 | newdb7 | | 772 | 0/A000840 | 0/E43E9240 | reserved | f| t| f | f slot_36 | newdb8 | | 772 | 0/A000878 | 0/E43E9240 | reserved | f| t| f | f slot_37 |
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Apr 2, 2024 at 11:58 AM Bertrand Drouvot wrote: > > Hi, > > On Tue, Apr 02, 2024 at 12:07:54PM +0900, Masahiko Sawada wrote: > > On Mon, Apr 1, 2024 at 12:18 PM Bharath Rupireddy > > > > FWIW, coming to this thread late, I think that the inactive_since > > should not be synchronized from the primary. The wall clocks are > > different on the primary and the standby so having the primary's > > timestamp on the standby can confuse users, especially when there is a > > big clock drift. Also, as Amit mentioned, inactive_since seems not to > > be consistent with other parameters we copy. The > > replication_slot_inactive_timeout feature should work on the standby > > independent from the primary, like other slot invalidation mechanisms, > > and it should be based on its own local clock. > > Thanks for sharing your thoughts! So, it looks like that most of us agree to > not > sync inactive_since from the primary, I'm fine with that. +1 on not syncing slots from primary. > > If we want to invalidate the synced slots due to the timeout, I think > > we need to define what is "inactive" for synced slots. > > > > Suppose that the slotsync worker updates the local (synced) slot's > > inactive_since whenever releasing the slot, irrespective of the actual > > LSNs (or other slot parameters) having been updated. I think that this > > idea cannot handle a slot that is not acquired on the primary. In this > > case, the remote slot is inactive but the local slot is regarded as > > active. WAL files are piled up on the standby (and on the primary) as > > the slot's LSNs don't move forward. I think we want to regard such a > > slot as "inactive" also on the standby and invalidate it because of > > the timeout. > > I think that makes sense to somehow link inactive_since on the standby to > the actual LSNs (or other slot parameters) being updated or not. > > > > > Now, the other concern is that calling GetCurrentTimestamp() > > > > could be costly when the values for the slot are not going to be > > > > updated but if that happens we can optimize such that before acquiring > > > > the slot we can have some minimal pre-checks to ensure whether we need > > > > to update the slot or not. > > > > If we use such pre-checks, another problem might happen; it cannot > > handle a case where the slot is acquired on the primary but its LSNs > > don't move forward. Imagine a logical replication conflict happened on > > the subscriber, and the logical replication enters the retry loop. In > > this case, the remote slot's inactive_since gets updated for every > > retry, but it looks inactive from the standby since the slot LSNs > > don't change. Therefore, only the local slot could be invalidated due > > to the timeout but probably we don't want to regard such a slot as > > "inactive". > > > > Another idea I came up with is that the slotsync worker updates the > > local slot's inactive_since to the local timestamp only when the > > remote slot might have got inactive. If the remote slot is acquired by > > someone, the local slot's inactive_since is also NULL. If the remote > > slot gets inactive, the slotsync worker sets the local timestamp to > > the local slot's inactive_since. Since the remote slot could be > > acquired and released before the slotsync worker gets the remote slot > > data again, if the remote slot's inactive_since > the local slot's > > inactive_since, the slotsync worker updates the local one. > > Then I think we would need to be careful about time zone comparison. Yes. Also we need to consider the case when a user is relying on pg_sync_replication_slots() and has not enabled slot-sync worker. In such a case if synced slot's inactive_since is derived from inactivity of remote-slot, it might not be that frequently updated (based on when the user actually runs the SQL function) and thus may be misleading. OTOH, if inactivty_since of synced slots represents its own inactivity, then it will give correct info even for the case when the SQL function is run after a long time and slot-sync worker is disabled. > > IOW, we > > detect whether the remote slot was acquired and released since the > > last synchronization, by checking the remote slot's inactive_since. > > This idea seems to handle these cases I mentioned unless I'm missing > > something, but it requires for the slotsync worker to update > > inactive_since in a different way than other parameters. > > > > Or a simple solution is that the slotsync worker updates > > inactive_since as it does for non-synced slots, and disables > > timeout-based slot invalidation for synced slots. I like this idea better, it takes care of such a case too when the user is relying on sync-function rather than worker and does not want to get the slots invalidated in between 2 sync function calls. > Yeah, I think the main question to help us decide is: do we want to invalidate > "inactive" synced slots locally (in addition to synchronizing the invalidation > from the primary)?
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Apr 3, 2024 at 11:17 AM Bharath Rupireddy wrote: > > On Wed, Apr 3, 2024 at 8:38 AM shveta malik wrote: > > > > > > Or a simple solution is that the slotsync worker updates > > > > inactive_since as it does for non-synced slots, and disables > > > > timeout-based slot invalidation for synced slots. > > > > I like this idea better, it takes care of such a case too when the > > user is relying on sync-function rather than worker and does not want > > to get the slots invalidated in between 2 sync function calls. > > Please find the attached v31 patches implementing the above idea: > Thanks for the patches, please find few comments: v31-001: 1) system-views.sgml: value will get updated after every synchronization from the corresponding remote slot on the primary. --This is confusing. It will be good to rephrase it. 2) update_synced_slots_inactive_since() --May be, we should mention in the header that this function is called only during promotion. 3) 040_standby_failover_slots_sync.pl: We capture inactive_since_on_primary when we do this for the first time at #175 ALTER SUBSCRIPTION regress_mysub1 DISABLE" But we again recreate the sub and disable it at line #280. Do you think we shall get inactive_since_on_primary again here, to be compared with inactive_since_on_new_primary later? v31-002: (I had reviewed v29-002 but missed to post comments, I think these are still applicable) 1) I think replication_slot_inactivity_timeout was recommended here (instead of replication_slot_inactive_timeout, so please give it a thought): https://www.postgresql.org/message-id/202403260739.udlp7lxixktx%40alvherre.pgsql 2) Commit msg: a) "It is often easy for developers to set a timeout of say 1 or 2 or 3 days at slot level, after which the inactive slots get dropped." Shall we say invalidated rather than dropped? b) "To achieve the above, postgres introduces a GUC allowing users set inactive timeout and then a slot stays inactive for this much amount of time it invalidates the slot." Broken sentence. thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Apr 3, 2024 at 3:36 PM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > wrote: > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > +if (DecodingContextReady(ctx) && found_consistent_snapshot) > > > +*found_consistent_snapshot = true; > > > > > > Can the found_consistent_snapshot be checked first to help avoid the > > > function call DecodingContextReady() for pg_replication_slot_advance > > > callers? > > > > > > > Okay, changed. Additionally, I have updated the comments and commit > > message. I'll push this patch after some more testing. > > > > Pushed! There is an intermittent BF failure observed at [1] after this commit (2ec005b). Analysis: We see in BF logs, that during the first call of the sync function, restart_lsn of the synced slot is advanced to a lsn which is > than remote slot's restart_lsn. And when sync call is done second time without any further change on primary, it hits the error: ERROR: cannot synchronize local slot "lsub1_slot" LSN(0/360) to remote slot's LSN(0/328) as synchronization would move it backwards Relevant BF logs are given at [2]. This reproduces intermittently depending upon if bgwriter logs running txn record when the test is running. We were able to mimic the test case to reproduce the failure. Please see attached bf-test.txt for steps. Issue: Issue is that we are having a wrong sanity check based on 'restart_lsn' in synchronize_one_slot(): if (remote_slot->restart_lsn < slot->data.restart_lsn) elog(ERROR, ...); Prior to commit 2ec005b, this check was okay, as we did not expect restart_lsn of the synced slot to be ahead of remote since we were directly copying the lsns. But now when we use 'advance' to do logical decoding on standby, there is a possibility that restart lsn of the synced slot is ahead of remote slot, if there are running txns records found after reaching consistent-point while consuming WALs from restart_lsn till confirmed_lsn. In such a case, slot-sync's advance may end up serializing snapshots and setting restart_lsn to the serialized snapshot point, ahead of remote one. Fix: The sanity check needs to be corrected. Attached a patch to address the issue. a) The sanity check is corrected to compare confirmed_lsn rather than restart_lsn. Additional changes: b) A log has been added after LogicalSlotAdvanceAndCheckSnapState() to log the case when the local and remote slots' confirmed-lsn were not found to be the same after sync (if at all). c) Now we attempt to sync in update_local_synced_slot() if one of confirmed_lsn, restart_lsn, and catalog_xmin for remote slot is ahead of local slot instead of them just being unequal. [1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2017%3A57%3A28 [2]: 2024-04-03 18:00:41.896 UTC [3239617][client backend][0/2:0] LOG: statement: SELECT pg_sync_replication_slots(); LOG: starting logical decoding for slot "lsub1_slot" DETAIL: Streaming transactions committing after 0/0, reading WAL from 0/328. LOG: logical decoding found consistent point at 0/328 DEBUG: serializing snapshot to pg_logical/snapshots/0-360.snap DEBUG: got new restart lsn 0/360 at 0/360 LOG: newly created slot "lsub1_slot" is sync-ready now 2024-04-03 18:00:45.218 UTC [3243487][client backend][2/2:0] LOG: statement: SELECT pg_sync_replication_slots(); ERROR: cannot synchronize local slot "lsub1_slot" LSN(0/360) to remote slot's LSN(0/328) as synchronization would move it backwards thanks Shveta Keep sync_replication_slots as disabled on standby. --primary: SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); SELECT pg_log_standby_snapshot(); SELECT pg_log_standby_snapshot(); SELECT pg_log_standby_snapshot(); SELECT pg_log_standby_snapshot(); select pg_sleep(2); SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); select slot_name,database,xmin,catalog_xmin cxmin,restart_lsn,confirmed_flush_lsn flushLsn,wal_status,conflicting conf, failover,synced,temporary temp, active from pg_replication_slots order by slot_name; --standby: select pg_sync_replication_slots(); select pg_sync_replication_slots(); -- ERROR here v1-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
promotion related handling in pg_sync_replication_slots()
Hello hackers, Currently, promotion related handling is missing in the slot sync SQL function pg_sync_replication_slots(). Here is the background on how it is done in slot sync worker: During promotion, the startup process in order to shut down the slot-sync worker, sets the 'stopSignaled' flag, sends the shut-down signal, and waits for slot sync worker to exit. Meanwhile if the postmaster has not noticed the promotion yet, it may end up restarting slot sync worker. In such a case, the worker exits if 'stopSignaled' is set. Since there is a chance that the user (or any of his scripts/tools) may execute SQL function pg_sync_replication_slots() in parallel to promotion, such handling is needed in this SQL function as well, The attached patch attempts to implement the same. Changes are: 1) If pg_sync_replication_slots() is already running when the promotion is triggered, ShutDownSlotSync() checks the 'SlotSyncCtx->syncing' flag as well and waits for it to become false i.e. waits till parallel running SQL function is finished. 2) If pg_sync_replication_slots() is invoked when promotion is already in progress, pg_sync_replication_slots() respects the 'stopSignaled' flag set by the startup process and becomes a no-op. thanks Shveta v1-0001-Handle-stopSignaled-during-sync-function-call.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > Prior to commit 2ec005b, this check was okay, as we did not expect > restart_lsn of the synced slot to be ahead of remote since we were > directly copying the lsns. But now when we use 'advance' to do logical > decoding on standby, there is a possibility that restart lsn of the > synced slot is ahead of remote slot, if there are running txns records > found after reaching consistent-point while consuming WALs from > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance > may end up serializing snapshots and setting restart_lsn to the > serialized snapshot point, ahead of remote one. > > Fix: > The sanity check needs to be corrected. Attached a patch to address the issue. Please find v2 which has detailed commit-msg and some more comments in code. thanks Shveta v2-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Thu, Apr 4, 2024 at 5:53 PM Bharath Rupireddy wrote: > > On Thu, Apr 4, 2024 at 4:35 PM Amit Kapila wrote: > > > > > Thanks for the changes. v34-0001 LGTM. > > > > I was doing a final review before pushing 0001 and found that > > 'inactive_since' could be set twice during startup after promotion, > > once while restoring slots and then via ShutDownSlotSync(). The reason > > is that ShutDownSlotSync() will be invoked in normal startup on > > primary though it won't do anything apart from setting inactive_since > > if we have synced slots. I think you need to check 'StandbyMode' in > > update_synced_slots_inactive_since() and return if the same is not > > set. We can't use 'InRecovery' flag as that will be set even during > > crash recovery. > > > > Can you please test this once unless you don't agree with the above theory? > > Nice catch. I've verified that update_synced_slots_inactive_since is > called even for normal server startups/crash recovery. I've added a > check to exit if the StandbyMode isn't set. > > Please find the attached v35 patch. Thanks for the patch. Tested it , works well. Few cosmetic changes needed: in 040 test file: 1) # Capture the inactive_since of the slot from the primary. Note that the slot # will be inactive since the corresponding subscription is disabled.. 2 .. at the end. Replace with one. 2) # Synced slot on the standby must get its own inactive_since. . not needed in single line comment (to be consistent with neighbouring comments) 3) update_synced_slots_inactive_since(): if (!StandbyMode) return; It will be good to add comments here. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot wrote: > > Hi, > > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > > > > > > Prior to commit 2ec005b, this check was okay, as we did not expect > > > restart_lsn of the synced slot to be ahead of remote since we were > > > directly copying the lsns. But now when we use 'advance' to do logical > > > decoding on standby, there is a possibility that restart lsn of the > > > synced slot is ahead of remote slot, if there are running txns records > > > found after reaching consistent-point while consuming WALs from > > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance > > > may end up serializing snapshots and setting restart_lsn to the > > > serialized snapshot point, ahead of remote one. > > > > > > Fix: > > > The sanity check needs to be corrected. Attached a patch to address the > > > issue. > > > > Thanks for reporting, explaining the issue and providing a patch. > > Regarding the patch: > > 1 === > > +* Attempt to sync lsns and xmins only if remote slot is ahead of > local > > s/lsns/LSNs/? > > 2 === > > + if (slot->data.confirmed_flush != > remote_slot->confirmed_lsn) > + elog(LOG, > +"could not synchronize local slot > \"%s\" LSN(%X/%X)" > +" to remote slot's LSN(%X/%X) ", > +remote_slot->name, > + > LSN_FORMAT_ARGS(slot->data.confirmed_flush), > + > LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)); > > I don't think that the message is correct here. Unless I am missing something > there is nothing in the following code path that would prevent the slot to be > sync during this cycle. This is a sanity check, I will put a comment to indicate the same. We want to ensure if anything changes in future, we get correct logs to indicate that. If required, the LOG msg can be changed. Kindly suggest if you have anything better in mind. thanks Shveta
Re: promotion related handling in pg_sync_replication_slots()
On Thu, Apr 4, 2024 at 5:17 PM Amit Kapila wrote: > > On Thu, Apr 4, 2024 at 5:05 PM shveta malik wrote: > > > > Hello hackers, > > > > Currently, promotion related handling is missing in the slot sync SQL > > function pg_sync_replication_slots(). Here is the background on how > > it is done in slot sync worker: > > During promotion, the startup process in order to shut down the > > slot-sync worker, sets the 'stopSignaled' flag, sends the shut-down > > signal, and waits for slot sync worker to exit. Meanwhile if the > > postmaster has not noticed the promotion yet, it may end up restarting > > slot sync worker. In such a case, the worker exits if 'stopSignaled' > > is set. > > > > Since there is a chance that the user (or any of his scripts/tools) > > may execute SQL function pg_sync_replication_slots() in parallel to > > promotion, such handling is needed in this SQL function as well, The > > attached patch attempts to implement the same. > > > > Thanks for the report and patch. I'll look into it. > Please find v2. Changes are: 1) Rebased the patch as there was conflict due to recent commit 6f132ed. 2) Added an Assert in update_synced_slots_inactive_since() to ensure that the slot does not have active_pid. 3) Improved commit msg and comments. thanks Shveta v2-0001-Handle-stopSignaled-during-sync-function-call.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot wrote: > > What about something like? > > ereport(LOG, > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from > remote slot", > remote_slot->name), > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", > LSN_FORMAT_ARGS(remote_slot->restart_lsn), > LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > Regards, +1. Better than earlier. I will update and post the patch. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot wrote: > > BTW, I just realized that the LSN I used in my example in the > LSN_FORMAT_ARGS() > are not the right ones. Noted. Thanks. Please find v3 with the comments addressed. thanks Shveta v3-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Tue, Feb 13, 2024 at 6:45 AM Zhijie Hou (Fujitsu) wrote: > > On Monday, February 12, 2024 5:40 PM Amit Kapila > wrote: > > Thanks for the comments, I have addressed them. > > Here is the new version patch which addressed above and > most of Bertrand's comments. Thanks for the patch. I am trying to run valgrind on patch001. I followed the steps given in [1]. It ended up generating 850 log files. Is there a way to figure out that we have a memory related problem w/o going through each log file manually? I also tried running the steps with '-leak-check=summary' (in the first run, it was '-leak-check=no' as suggested in wiki) with and without the patch and tried comparing those manually for a few of them. I found o/p more or less the same. But this is a mammoth task if we have to do it manually for 850 files. So any pointers here? For reference: Sample log file with '-leak-check=no' ==00:00:08:44.321 250746== HEAP SUMMARY: ==00:00:08:44.321 250746== in use at exit: 1,298,274 bytes in 290 blocks ==00:00:08:44.321 250746== total heap usage: 11,958 allocs, 7,005 frees, 8,175,630 bytes allocated ==00:00:08:44.321 250746== ==00:00:08:44.321 250746== For a detailed leak analysis, rerun with: --leak-check=full ==00:00:08:44.321 250746== ==00:00:08:44.321 250746== For lists of detected and suppressed errors, rerun with: -s ==00:00:08:44.321 250746== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Sample log file with '-leak-check=summary' ==00:00:00:27.300 265785== HEAP SUMMARY: ==00:00:00:27.300 265785== in use at exit: 1,929,907 bytes in 310 blocks ==00:00:00:27.300 265785== total heap usage: 71,677 allocs, 7,754 frees, 95,750,897 bytes allocated ==00:00:00:27.300 265785== ==00:00:00:27.394 265785== LEAK SUMMARY: ==00:00:00:27.394 265785==definitely lost: 20,507 bytes in 171 blocks ==00:00:00:27.394 265785==indirectly lost: 16,419 bytes in 61 blocks ==00:00:00:27.394 265785== possibly lost: 354,670 bytes in 905 blocks ==00:00:00:27.394 265785==still reachable: 592,586 bytes in 1,473 blocks ==00:00:00:27.394 265785== suppressed: 0 bytes in 0 blocks ==00:00:00:27.394 265785== Rerun with --leak-check=full to see details of leaked memory ==00:00:00:27.394 265785== ==00:00:00:27.394 265785== For lists of detected and suppressed errors, rerun with: -s ==00:00:00:27.394 265785== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [1]: https://wiki.postgresql.org/wiki/Valgrind thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Feb 9, 2024 at 10:04 AM Amit Kapila wrote: > > +reserve_wal_for_local_slot(XLogRecPtr restart_lsn) > { > ... > + /* > + * Find the oldest existing WAL segment file. > + * > + * Normally, we can determine it by using the last removed segment > + * number. However, if no WAL segment files have been removed by a > + * checkpoint since startup, we need to search for the oldest segment > + * file currently existing in XLOGDIR. > + */ > + oldest_segno = XLogGetLastRemovedSegno() + 1; > + > + if (oldest_segno == 1) > + { > + TimeLineID cur_timeline; > + > + GetWalRcvFlushRecPtr(NULL, &cur_timeline); > + oldest_segno = XLogGetOldestSegno(cur_timeline); > ... > ... > > This means that if the restart_lsn of the slot is from the prior > timeline then the standby needs to wait for longer times to sync the > slot. Ideally, it should be okay because I don't think even if > restart_lsn of the slot may be from some prior timeline than the > current flush timeline on standby, how often that case can happen? I tested this behaviour on v85 patch, it is working as expected i.e. if remot_slot's lsn belongs to a prior timeline then on executing pg_sync_replication_slots() function, it creates a temporary slot and waits for primary to catch up. And once primary catches up, the next execution of SQL function persistes the slot and syncs it. Setup: primary-->standby1-->standby2 Steps: 1) Insert data on primary. It gets replicated to both standbys. 2) Create logical slot on primary and execute pg_sync_replication_slots() on standby1. The slot gets synced and persisted on standby1. 3) Shutdown standby2. 4) Insert data on primary. It gets replicated to standby1. 5) Shutdown primary and promote standby1. 6) Insert some data on standby1/new primary directly. 7) Start standby2: It now needs to catch up old data of timeline1 (from step 4) + new data of timeline2 (from step 6) . It does that. On reaching the end of the old timeline, walreceiver gets restarted and starts streaming using the new timeline. 8) Execute pg_sync_replication_slots() on standby2 to sync the slot. Now remote_slot's lsn belongs to a prior timeline on standby2. In my test-run, remote_slot's lsn belonged to segno=4 on standby2, while the oldest segno of current_timline(2) was 6. Thus it created the slot locally with lsn belonging to the oldest segno 6 of cur_timeline(2) but did not persist it as remote_slot's lsn was behind. 9) Now on standby1/new-primary, advance the logical slot by calling pg_replication_slot_advance(). 10) Execute pg_sync_replication_slots() again on standby2, now the local temporary slot gets persisted as the restart_lsn of primary has caught up. thanks Shveta
Re: About a recently-added message
On Thu, Feb 15, 2024 at 8:26 AM Amit Kapila wrote: > > On Wed, Feb 14, 2024 at 7:51 PM Euler Taveira wrote: > > > > On Wed, Feb 14, 2024, at 8:45 AM, Amit Kapila wrote: > > > > Now, I am less clear about whether to quote "logical" or not in the > > above message. Do you have any suggestions? > > > > > > The possible confusion comes from the fact that the sentence contains "must > > be" > > in the middle of a comparison expression. For pg_createsubscriber, we are > > using > > > > publisher requires wal_level >= logical > > > > I suggest to use something like > > > > slot synchronization requires wal_level >= logical > > > > This sounds like a better idea but shouldn't we directly use this in > 'errmsg' instead of a separate 'errhint'? Also, if change this, then > we should make a similar change for other messages in the same > function. +1 on changing the msg(s) suggested way. Please find the patch for the same. It also removes double quotes around the variable names thanks Shveta v1-0001-Fix-quotation-of-variable-names.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Sun, Feb 18, 2024 at 7:40 PM Zhijie Hou (Fujitsu) wrote: > > On Friday, February 16, 2024 6:41 PM shveta malik > wrote: > Thanks for the patch. Here are few comments: Thanks for the comments. > > 2. > > +static bool > +validate_remote_info(WalReceiverConn *wrconn, int elevel) > ... > + > + return (!remote_in_recovery && primary_slot_valid); > > The primary_slot_valid could be uninitialized in this function. return (!remote_in_recovery && primary_slot_valid); Here if remote_in_recovery is true, it will not even read primary_slot_valid. It will read primary_slot_valid only if remote_in_recovery is false and in such a case primary_slot_valid will always be initialized in the else block above, let me know if you still feel we shall initialize this to some default? thanks Shveta
Re: A new message seems missing a punctuation
On Mon, Feb 19, 2024 at 10:31 AM Robert Haas wrote: > > On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi > wrote: > > A recent commit (7a424ece48) added the following message: > > > > > could not sync slot information as remote slot precedes local slot: > > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > > > (%X/%X), catalog xmin (%u) > > > > Since it is a bit overloaded but doesn't have a separator between > > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't > > we need something, for example a semicolon there to improve > > readability and reduce clutter? > > I think maybe we should even revise the message even more. In most > places we do not just print out a whole bunch of values, but use a > sentence to connect them. I have tried to reword the msg, please have a look. thanks Shveta v1-0001-Reword-LOG-msg-for-slot-sync.patch Description: Binary data
Re: About a recently-added message
On Mon, Feb 19, 2024 at 11:10 AM Amit Kapila wrote: > > On Thu, Feb 15, 2024 at 11:49 AM Kyotaro Horiguchi > wrote: > > > > At Thu, 15 Feb 2024 09:22:23 +0530, shveta malik > > wrote in > > > > > > +1 on changing the msg(s) suggested way. Please find the patch for the > > > same. It also removes double quotes around the variable names > > > > Thanks for the discussion. > > > > With a translator hat on, I would be happy if I could determine > > whether a word requires translation with minimal background > > information. In this case, a translator needs to know which values > > wal_level can take. It's relatively easy in this case, but I'm not > > sure if this is always the case. Therefore, I would be slightly > > happier if "logical" were double-quoted. > > > > I see that we use "logical" in double quotes in various error > messages. For example: "wal_level must be set to \"replica\" or > \"logical\" at server start". So following that we can use the double > quotes here as well. Okay, now since we will have double quotes for logical. So do you prefer the existing way of giving error msg or the changed one. Existing: errmsg("bad configuration for slot synchronization"), errhint("wal_level must be >= logical.")); errmsg("bad configuration for slot synchronization"), errhint("%s must be defined.", "primary_conninfo")); The changed one: errmsg("slot synchronization requires wal_level >= logical")); errmsg("slot synchronization requires %s to be defined", "primary_conninfo")); thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Feb 19, 2024 at 5:32 PM Amit Kapila wrote: > > Few comments on 0001 Thanks for the feedback. > > 1. I think it is better to error out when the valid GUC or option is > not set in ensure_valid_slotsync_params() and > ensure_valid_remote_info() instead of waiting. And we shouldn't start > the worker in the first place if not all required GUCs are set. This > will additionally simplify the code a bit. Sure, removed 'ensure' functions. Moved the validation checks to the postmaster before starting the slot sync worker. > 2. > +typedef struct SlotSyncWorkerCtxStruct > { > - /* prevents concurrent slot syncs to avoid slot overwrites */ > + pid_t pid; > + bool stopSignaled; > bool syncing; > + time_t last_start_time; > slock_t mutex; > -} SlotSyncCtxStruct; > +} SlotSyncWorkerCtxStruct; > > I think we don't need to change the name of this struct as this can be > used both by the worker and the backend. We can probably add the > comment to indicate that all the fields except 'syncing' are used by > slotsync worker. Modified. > 3. Similar to above, function names like SlotSyncShmemInit() shouldn't > be changed to SlotSyncWorkerShmemInit(). Modified. > 4. > +ReplSlotSyncWorkerMain(int argc, char *argv[]) > { > ... > + on_shmem_exit(slotsync_worker_onexit, (Datum) 0); > ... > + before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn)); > ... > } > > Do we need two separate callbacks? Can't we have just one (say > slotsync_failure_callback) that cleans additional things in case of > slotsync worker? And, if we need both those callbacks then please add > some comments for both and why one is before_shmem_exit() and the > other is on_shmem_exit(). I think we can merge these now. Earlier 'on_shmem_exit' was needed to avoid race-condition between startup and slot sync worker process to drop 'i' slots on promotion. Now we do not have any such scenario. But I need some time to analyze it well. Will do it in the next version. > In addition to the above, I have made a few changes in the comments > and code (cosmetic code changes). Please include those in the next > version if you find them okay. You need to rename .txt file to remove > .txt and apply atop v90-0001*. Sure, included these. Please find the patch001 attached. I will rebase the rest of the patches and post them tomorrow. thanks Shveta v91-0001-Add-a-new-slotsync-worker.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada wrote: > > > I've reviewed the v91 patch. Here are random comments: Thanks for the comments. > --- > /* > * Checks the remote server info. > * > - * We ensure that the 'primary_slot_name' exists on the remote server and the > - * remote server is not a standby node. > + * Check whether we are a cascading standby. For non-cascading standbys, it > + * also ensures that the 'primary_slot_name' exists on the remote server. > */ > > IIUC what the validate_remote_info() does doesn't not change by this > patch, so the previous comment seems to be clearer to me. > > --- > if (remote_in_recovery) > + { > + /* > +* If we are a cascading standby, no need to check further for > +* 'primary_slot_name'. > +*/ > ereport(ERROR, > errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("cannot synchronize replication slots from a > standby server")); > + } > + else > + { > + boolprimary_slot_valid; > > - primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull)); > - Assert(!isnull); > + primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull)); > + Assert(!isnull); > > - if (!primary_slot_valid) > - ereport(ERROR, > - errcode(ERRCODE_INVALID_PARAMETER_VALUE), > - errmsg("bad configuration for slot synchronization"), > - /* translator: second %s is a GUC variable name */ > - errdetail("The replication slot \"%s\" specified by > \"%s\" does not exist on the primary server.", > - PrimarySlotName, "primary_slot_name")); > + if (!primary_slot_valid) > + ereport(ERROR, > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("bad configuration for slot synchronization"), > + /* translator: second %s is a GUC variable name */ > + errdetail("The replication slot \"%s\" specified > by \"%s\" does not exist on the primary server.", > + PrimarySlotName, "primary_slot_name")); > + } > > I think it's a refactoring rather than changes required by the > slotsync worker. We can do that in a separate patch but why do we need > this change in the first place? In v90, this refactoring was made due to the fact that validate_remote_info() was supposed to behave differently for SQL function and slot-sync worker. SQL-function was supposed to ERROR out while the worker was supposed to LOG and become no-op. And thus the change was needed. In v91, we made this functionality same i.e. both sql function and worker will error out but missed to remove this refactoring. Thanks for catching this, I will revert it in the next version. To match the refactoring, I made the comment change too, will revert that as well. > --- > +ValidateSlotSyncParams(ERROR); > + > /* Load the libpq-specific functions */ > load_file("libpqwalreceiver", false); > > -ValidateSlotSyncParams(); > +(void) CheckDbnameInConninfo(); > > Is there any reason why we move where to check the parameters? Earlier DBname verification was done inside ValidateSlotSyncParams() and thus it was needed to load 'libpqwalreceiver' before we call this function. Now we have moved dbname verification in a separate call and thus the above order got changed. ValidateSlotSyncParams() is a common function used by SQL function and worker. Earlier slot sync worker was checking all the GUCs after starting up and was exiting each time any GUC was invalid. It was suggested that it would be better to check the GUCs before starting the slot sync worker in the postmaster itself, making the ValidateSlotSyncParams() move to postmaster (see SlotSyncWorkerAllowed). But it was not a good idea to load libpq in postmaster and thus we moved libpq related verification out of ValidateSlotSyncParams(). This resulted in the above change. I hope it answers your query. thanks Shveta
Re: About a recently-added message
On Tue, Feb 20, 2024 at 2:13 PM Amit Kapila wrote: > > I would prefer the changed ones as those clearly explain the problem > without additional information. okay, attached v2 patch with changed error msgs and double quotes around logical. thanks Shveta v2-0001-Fix-quotation-of-variable-names.patch Description: Binary data
Re: A new message seems missing a punctuation
On Tue, Feb 20, 2024 at 5:01 PM Amit Kapila wrote: > > > We do expose the required information (restart_lsn, catalog_xmin, > synced, temporary, etc) via pg_replication_slots. So, I feel the LOG > message here is sufficient to DEBUG (or know the details) when the > slot sync doesn't succeed. > Please find the patch having the suggested changes. thanks Shveta v2-0001-Reword-LOG-msg-for-slot-sync.patch Description: Binary data
'Shutdown <= SmartShutdown' check while launching processes in postmaster.
Hi hackers, I would like to know that why we have 'Shutdown <= SmartShutdown' check before launching few processes (WalReceiver, WalSummarizer, AutoVacuum worker) while rest of the processes (BGWriter, WalWriter, Checkpointer, Archiver etc) do not have any such check. If I have to launch a new process, what shall be the criteria to decide if I need this check? Looking forward to your expert advice. thanks Shveta
Re: Synchronizing slots from primary to standby
On Tue, Feb 20, 2024 at 6:19 PM Masahiko Sawada wrote: > > On Tue, Feb 20, 2024 at 12:33 PM shveta malik wrote: > Thank you for the explanation. It makes sense to me to move the check. > > > 2. If the wal_level is not logical, the server will need to restart > anyway to change the wal_level and have the slotsync worker work. Does > it make sense to have the postmaster exit if the wal_level is not > logical and sync_replication_slots is enabled? For instance, we have > similar checks in PostmsaterMain(): > > if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL) > ereport(ERROR, > (errmsg("WAL cannot be summarized when wal_level is > \"minimal\""))); Thanks for the feedback. I have addressed it in v93. thanks SHveta
Re: 'Shutdown <= SmartShutdown' check while launching processes in postmaster.
On Wed, Feb 21, 2024 at 10:01 AM Tom Lane wrote: > > shveta malik writes: > > I would like to know that why we have 'Shutdown <= SmartShutdown' > > check before launching few processes (WalReceiver, WalSummarizer, > > AutoVacuum worker) while rest of the processes (BGWriter, WalWriter, > > Checkpointer, Archiver etc) do not have any such check. If I have to > > launch a new process, what shall be the criteria to decide if I need > > this check? > > Children that are stopped by the "if (pmState == PM_STOP_BACKENDS)" > stanza in PostmasterStateMachine should not be allowed to start > again later if we are trying to shut down. (But "smart" shutdown > doesn't enforce that, since it's a very weak state that only > prohibits new client sessions.) The processes that are allowed > to continue beyond that point are ones that are needed to perform > the shutdown checkpoint, or useful to make it finish faster. Thank you for providing the details. It clarifies the situation. Do you think it would be beneficial to include this as a code comment in postmaster.c to simplify understanding for future readers? thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Feb 21, 2024 at 5:19 PM Amit Kapila wrote: > > A few minor comments: Thanks for the feedback. > = > 1. > +/* > + * Is stopSignaled set in SlotSyncCtx? > + */ > +bool > +IsStopSignaledSet(void) > +{ > + bool signaled; > + > + SpinLockAcquire(&SlotSyncCtx->mutex); > + signaled = SlotSyncCtx->stopSignaled; > + SpinLockRelease(&SlotSyncCtx->mutex); > + > + return signaled; > +} > + > +/* > + * Reset stopSignaled in SlotSyncCtx. > + */ > +void > +ResetStopSignaled(void) > +{ > + SpinLockAcquire(&SlotSyncCtx->mutex); > + SlotSyncCtx->stopSignaled = false; > + SpinLockRelease(&SlotSyncCtx->mutex); > +} > > I think these newly introduced functions don't need spinlock to be > acquired as these are just one-byte read-and-write. Additionally, when > IsStopSignaledSet() is invoked, there shouldn't be any concurrent > process to update that value. What do you think? Yes, we can avoid taking spinlock here. These functions are invoked after checking that pmState is PM_RUN. And in that state we do not expect any other process writing this flag. > 2. > +REPL_SLOTSYNC_MAIN "Waiting in main loop of slot sync worker." > +REPL_SLOTSYNC_SHUTDOWN "Waiting for slot sync worker to shut down." > > Let's use REPLICATION instead of REPL. I see other wait events using > REPLICATION in their names. Modified. > 3. > - * In standalone mode and in autovacuum worker processes, we use a fixed > - * ID, otherwise we figure it out from the authenticated user name. > + * In standalone mode, autovacuum worker processes and slot sync worker > + * process, we use a fixed ID, otherwise we figure it out from the > + * authenticated user name. > */ > - if (bootstrap || IsAutoVacuumWorkerProcess()) > + if (bootstrap || IsAutoVacuumWorkerProcess() || IsLogicalSlotSyncWorker()) > { > InitializeSessionUserIdStandalone(); > am_superuser = true; > > IIRC, we discussed this previously and it is safe to make the local > connection as superuser as we don't consult any user tables, so we can > probably add a comment where we invoke InitPostgres in slotsync.c Added comment. Thanks Hou-San for the analysis here and providing comment. > 4. > $publisher->safe_psql('postgres', > - "CREATE PUBLICATION regress_mypub FOR ALL TABLES;"); > + "CREATE PUBLICATION regress_mypub FOR ALL TABLES;" > +); > > Why this change is required in the patch? Not needed, removed it. > 5. > +# Confirm that restart_lsn and of confirmed_flush_lsn lsub1_slot slot > are synced > +# to the standby > > /and of/; looks like a typo Modified. > 6. > +# Confirm that restart_lsn and of confirmed_flush_lsn lsub1_slot slot > are synced > +# to the standby > +ok( $standby1->poll_query_until( > + 'postgres', > + "SELECT '$primary_restart_lsn' = restart_lsn AND > '$primary_flush_lsn' = confirmed_flush_lsn from pg_replication_slots > WHERE slot_name = 'lsub1_slot';"), > + 'restart_lsn and confirmed_flush_lsn of slot lsub1_slot synced to standby'); > + > ... > ... > +# Confirm the synced slot 'lsub1_slot' is retained on the new primary > +is($standby1->safe_psql('postgres', > + q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = > 'lsub1_slot';}), > + 'lsub1_slot', > + 'synced slot retained on the new primary'); > > In both these checks, we should additionally check the 'synced' and > 'temporary' flags to ensure that they are marked appropriately. Modified. Please find patch001 attached. There is a CFBot failure in patch002. The test added there needs some adjustment. We will rebase and post rest of the patches once we fix that issue. thanks Shveta v94-0001-Add-a-new-slotsync-worker.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Feb 22, 2024 at 10:31 AM shveta malik wrote: > > Please find patch001 attached. There is a CFBot failure in patch002. > The test added there needs some adjustment. We will rebase and post > rest of the patches once we fix that issue. > There was a recent commit 801792e to improve error messaging in slotsync.c which resulted in conflict. Thus rebased the patch. There is no new change in the patch attached thanks Shveta v94_2-0001-Add-a-new-slotsync-worker.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot wrote: > > Hi, > > Thanks! > > Some random comments about v92_001 (Sorry if it has already been discussed > up-thread): Thanks for the feedback. The patch is pushed 15 minutes back. I will prepare a top-up patch for your comments. > 1 === > > +* We do not update the 'synced' column from true to false here > > Worth to mention from which system view the 'synced' column belongs to? Sure, I will change it. > 2 === (Nit) > > +#define MIN_WORKER_NAPTIME_MS 200 > +#define MAX_WORKER_NAPTIME_MS 3 /* 30s */ > > [MIN|MAX]_SLOTSYNC_WORKER_NAPTIME_MS instead? It is used only in slotsync.c so > more a Nit. Okay, will change it, > 3 === > > res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow); > - > if (res->status != WALRCV_OK_TUPLES) > > Line removal intended? I feel the current style is better, where we do not have space between the function call and return value checking. > 4 === > > + if (wal_level < WAL_LEVEL_LOGICAL) > + { > + ereport(ERROR, > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("slot synchronization requires > wal_level >= \"logical\"")); > + return false; > + } > > I think the return is not needed here as it won't be reached due to the > "ERROR". > Or should we use "elevel" instead of "ERROR"? It was suggested to raise ERROR for wal_level validation, please see [1]. But yes, I will remove the return value. Thanks for catching this. > 5 === > > +* operate as a superuser. This is safe because the slot sync worker > does > +* not interact with user tables, eliminating the risk of executing > +* arbitrary code within triggers. > > Right. I did not check but if we are using operators in our remote SPI calls > then it would be worth to ensure they are coming from the pg_catalog schema? > Using something like "OPERATOR(pg_catalog.=)" using "=" as an example. Can you please elaborate this one, I am not sure if I understood it. [1]: https://www.postgresql.org/message-id/CAD21AoB2ipSzQb5-o5pEYKie4oTPJTsYR1ip9_wRVrF6HbBWDQ%40mail.gmail.com thanks Shveta
Re: Synchronizing slots from primary to standby
On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot wrote: > > Suppose that in synchronize_slots() the query would be: > > const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn," > " restart_lsn, catalog_xmin, two_phase, failover," > " database, conflict_reason" > " FROM pg_catalog.pg_replication_slots" > " WHERE failover and NOT temporary and 1 = 1"; > > Then my comment is to rewrite it to: > > const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn," > " restart_lsn, catalog_xmin, two_phase, failover," > " database, conflict_reason" > " FROM pg_catalog.pg_replication_slots" > " WHERE failover and NOT temporary and 1 OPERATOR(pg_catalog.=) 1"; > > to ensure the operator "=" is coming from the pg_catalog schema. > Thanks for the details, but slot-sync does not use SPI calls, it uses libpqrcv calls. So is this change needed? thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Feb 23, 2024 at 8:35 AM shveta malik wrote: > > On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot > wrote: > > > > Suppose that in synchronize_slots() the query would be: > > > > const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn," > > " restart_lsn, catalog_xmin, two_phase, failover," > > " database, conflict_reason" > > " FROM pg_catalog.pg_replication_slots" > > " WHERE failover and NOT temporary and 1 = 1"; > > > > Then my comment is to rewrite it to: > > > > const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn," > > " restart_lsn, catalog_xmin, two_phase, failover," > > " database, conflict_reason" > > " FROM pg_catalog.pg_replication_slots" > > " WHERE failover and NOT temporary and 1 OPERATOR(pg_catalog.=) 1"; > > > > to ensure the operator "=" is coming from the pg_catalog schema. > > > > Thanks for the details, but slot-sync does not use SPI calls, it uses > libpqrcv calls. So is this change needed? Additionally, I would like to have a better understanding of why it's necessary and whether it addresses any potential security risks. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Feb 23, 2024 at 10:06 AM Zhijie Hou (Fujitsu) wrote: > > > I noticed one CFbot failure[1] which is because the tap-test doesn't wait for > the > standby to catch up before promoting, thus the data inserted after promotion > could not be replicated to the subscriber. Add a wait_for_replay_catchup to > fix it. > > Apart from this, I also adjusted some variable names in the tap-test to be > consistent. And added back a mis-removed ProcessConfigFile call. > > [1] https://cirrus-ci.com/task/6126787437002752?logs=check_world#L312 > Thanks for the patches. Had a quick look at v95_2, here are some trivial comments: slot.h: - 1) extern List *GetStandbySlotList(bool copy); extern void WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn); extern void FilterStandbySlots(XLogRecPtr wait_for_lsn, List **standby_slots); The order is different from the one in slot.c slot.c: - 2) warningfmt = _("replication slot \"%s\" specified in parameter \"%s\" does not exist, ignoring"); GUC names should not have double quotes. Same in each warningfmt in this function 3) errmsg("replication slot \"%s\" specified in parameter \"%s\" does not have active_pid", Same here, double quotes around standby_slot_names should be removed walsender.c: -- 4) * Used by logical decoding SQL functions that acquired slot with failover * enabled. To be consistent with other such comments in previous patches: slot with failover enabled --> failover enabled slot 5) Wake up the logical walsender processes with failover-enabled slots failover-enabled slots --> failover enabled slots postgresql.conf.sample: -- 6) streaming replication standby server slot names that logical walsender processes will wait for Is it better to say it like this? (I leave this to your preference) streaming replication standby server slot names for which logical walsender processes will wait. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Feb 23, 2024 at 1:28 PM Bertrand Drouvot wrote: > > Hi, > > Because one could create say the "=" OPERATOR in their own schema, attach a > function to it doing undesired stuff and change the search_path for the > database > the sync slot worker connects to. > > Then this new "=" operator would be used (instead of the pg_catalog.= one), > triggering the "undesired" function as superuser. Thanks for the details. I understand it now. We do not use '=' in our main slots-fetch query but we do use '=' in remote-validation query. See validate_remote_info(). Do you think instead of doing the above, we can override search-path with empty string in the slot-sync case. SImilar to logical apply worker and autovacuum worker case (see InitializeLogRepWorker(), AutoVacWorkerMain()). thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot wrote: > > Hi, > > I think to set secure search path for remote connection, the standard > > approach > > could be to extend the code in libpqrcv_connect[1], so that we don't need > > to schema > > qualify all the operators in the queries. > > > > And for local connection, I agree it's also needed to add a > > SetConfigOption("search_path", "" call in the slotsync worker. > > > > [1] > > libpqrcv_connect > > ... > > if (logical) > > ... > > res = libpqrcv_PQexec(conn->streamConn, > > > > ALWAYS_SECURE_SEARCH_PATH_SQL); > > > > Agree, something like in the attached? (it's .txt to not disturb the CF bot). Thanks for the patch, changes look good. I have corporated it in the patch which addresses the rest of your comments in [1]. I have attached the patch as .txt [1]: https://www.postgresql.org/message-id/ZdcejBDCr%2BwlVGnO%40ip-10-97-1-34.eu-west-3.compute.internal thanks Shveta From f1489f783748e85749a576cf21921e37137efd2a Mon Sep 17 00:00:00 2001 From: Shveta Malik Date: Thu, 22 Feb 2024 17:11:25 +0530 Subject: [PATCH v1] Top up Patch for commit 93db6cbda0 --- src/backend/access/transam/xlogrecovery.c | 15 --- .../libpqwalreceiver/libpqwalreceiver.c | 2 +- src/backend/replication/logical/slotsync.c| 19 +++ 3 files changed, 20 insertions(+), 16 deletions(-) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index d73a49b3e8..9d907bf0e4 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -1472,13 +1472,14 @@ FinishWalRecovery(void) * Shutdown the slot sync worker to drop any temporary slots acquired by * it and to prevent it from keep trying to fetch the failover slots. * -* We do not update the 'synced' column from true to false here, as any -* failed update could leave 'synced' column false for some slots. This -* could cause issues during slot sync after restarting the server as a -* standby. While updating the 'synced' column after switching to the new -* timeline is an option, it does not simplify the handling for the -* 'synced' column. Therefore, we retain the 'synced' column as true after -* promotion as it may provide useful information about the slot origin. +* We do not update the 'synced' column in 'pg_replication_slots' system +* view from true to false here, as any failed update could leave 'synced' +* column false for some slots. This could cause issues during slot sync +* after restarting the server as a standby. While updating the 'synced' +* column after switching to the new timeline is an option, it does not +* simplify the handling for the 'synced' column. Therefore, we retain the +* 'synced' column as true after promotion as it may provide useful +* information about the slot origin. */ ShutDownSlotSync(); diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 04271ee703..a30528a5f6 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -271,7 +271,7 @@ libpqrcv_connect(const char *conninfo, bool replication, bool logical, errhint("Target server's authentication method must be changed, or set password_required=false in the subscription parameters."))); } - if (logical) + if (logical || !replication) { PGresult *res; diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index 36773cfe73..5bb9e5d8b3 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -105,10 +105,10 @@ bool sync_replication_slots = false; * (within a MIN/MAX range) according to slot activity. See * wait_for_slot_activity() for details. */ -#define MIN_WORKER_NAPTIME_MS 200 -#define MAX_WORKER_NAPTIME_MS 3 /* 30s */ +#define MIN_SLOTSYNC_WORKER_NAPTIME_MS 200 +#define MAX_SLOTSYNC_WORKER_NAPTIME_MS 3 /* 30s */ -static long sleep_ms = MIN_WORKER_NAPTIME_MS; +static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS; /* The restart interval for slot sync work used by postmaster */ #define SLOTSYNC_RESTART_INTERVAL_SEC 10 @@ -924,12 +924,9 @@ ValidateSlotSyncParams(int elevel) * in this case regardl
Regardign RecentFlushPtr in WalSndWaitForWal()
Hi hackers, I would like to understand why we have code [1] that retrieves RecentFlushPtr in WalSndWaitForWal() outside of the loop. We utilize RecentFlushPtr later within the loop, but prior to that, we already have [2]. Wouldn't [2] alone be sufficient? Just to check the impact, I ran 'make check-world' after removing [1], and did not see any issue exposed by the test at-least. Any thoughts? [1]: /* Get a more recent flush pointer. */ if (!RecoveryInProgress()) RecentFlushPtr = GetFlushRecPtr(NULL); else RecentFlushPtr = GetXLogReplayRecPtr(NULL); [2]: /* Update our idea of the currently flushed position. */ else if (!RecoveryInProgress()) RecentFlushPtr = GetFlushRecPtr(NULL); else RecentFlushPtr = GetXLogReplayRecPtr(NULL); thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Feb 26, 2024 at 5:18 PM Amit Kapila wrote: > > > > > + if (!ok) > > > > + GUC_check_errdetail("List syntax is invalid."); > > > > + > > > > + /* > > > > +* If there is a syntax error in the name or if the replication > > > > slots' > > > > +* data is not initialized yet (i.e., we are in the startup > > > > process), skip > > > > +* the slot verification. > > > > +*/ > > > > + if (!ok || !ReplicationSlotCtl) > > > > + { > > > > + pfree(rawname); > > > > + list_free(elemlist); > > > > + return ok; > > > > + } > > > > > > > > we are testing the "ok" value twice, what about using if...else if... > > > > instead and > > > > test it once? If so, it might be worth to put the: > > > > > > > > " > > > > + pfree(rawname); > > > > + list_free(elemlist); > > > > + return ok; > > > > " > > > > > > > > in a "goto". > > > > > > There were comments to remove the 'goto' statement and avoid > > > duplicate free code, so I prefer the current style. > > > > The duplicate free code would come from the if...else if... rewrite but then > > the "goto" would remove it, so I'm not sure to understand your point. > > > > I think Hou-San wants to say that there was previously a comment to > remove goto and now you are saying to introduce it. But, I think we > can avoid both code duplication and goto, if the first thing we check > in the function is ReplicationSlotCtl and return false if the same is > not set. Won't that be better? I think we can not do that as we need to check atleast syntax before we return due to NULL ReplicationSlotCtl. We get NULL ReplicationSlotCtl during instance startup in check_standby_slot_names() as postmaster first loads GUC-table and then initializes shared-memory for replication slots. See calls of InitializeGUCOptions() and CreateSharedMemoryAndSemaphores() in PostmasterMain(). FWIW, I do not have any issue with current code as well, but if we have to change it, is [1] any better? [1]: check_standby_slot_names() { if (!ok) { GUC_check_errdetail("List syntax is invalid."); } else if (ReplicationSlotCtl) { foreach-loop for slot validation } pfree(rawname); list_free(elemlist); return ok; } thanks SHveta
Re: Synchronizing slots from primary to standby
On Wed, Feb 28, 2024 at 8:49 AM Amit Kapila wrote: > > > Few comments: Thanks for the feedback. > === > 1. > - if (logical) > + if (logical || !replication) > { > > Can we add a comment about connection types that require > ALWAYS_SECURE_SEARCH_PATH_SQL? > > 2. > Can we add a test case to demonstrate that the '=' operator can be > hijacked to do different things when the slotsync worker didn't use > ALWAYS_SECURE_SEARCH_PATH_SQL? > Here is the patch with new test added and improved comments. thanks Shveta v2-0001-Fixups-for-commit-93db6cbda0.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Wed, Feb 28, 2024 at 1:33 PM Bertrand Drouvot wrote: > > Hi, > A few comments: Thanks for reviewing. > > 1 === > > +* used to run normal SQL queries > > s/run normal SQL/run SQL/ ? > > As mentioned up-thread I don't like that much the idea of creating such a test > but if we do then here are my comments: > > 2 === > > +CREATE FUNCTION myschema.myintne(bigint, int) > > Should we explain why 'bigint, int' is important here (instead of > 'int, int')? > > 3 === > > +# stage of syncing newly created slots. If the worker was not prepared > +# to handle such attacks, it would have failed during > > Worth to mention the underlying check / function that would get an > "unexpected" > result? > > Except for the above (nit) comments the patch looks good to me. Here is the patch which addresses the above comments. Also optimized the test a little bit. Now we use pg_sync_replication_slots() function instead of worker to test the operator-redirection using search-patch. This has been done to simplify the test case and reduce the added time. thanks Shveta v3-0001-Fixups-for-commit-93db6cbda0.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Feb 29, 2024 at 7:04 AM Zhijie Hou (Fujitsu) wrote: > > Here is the v101 patch set which addressed above comments. > Thanks for the patch. Few comments: 1) Shall we mention in doc that shutdown will wait for standbys in standby_slot_names to confirm receiving WAL: Suggestion for logicaldecoding.sgml: When standby_slot_names is utilized, the primary server will not completely shut down until the corresponding standbys, associated with the physical replication slots specified in standby_slot_names, have confirmed receiving the WAL up to the latest flushed position on the primary server. slot.c 2) /* * If a logical slot name is provided in standby_slot_names, report * a message and skip it. Although logical slots are disallowed in * the GUC check_hook(validate_standby_slots), it is still * possible for a user to drop an existing physical slot and * recreate a logical slot with the same name. */ This is not completely true, we can still specify a logical slot during instance start and it will accept it. Suggestion: /* * If a logical slot name is provided in standby_slot_names, report * a message and skip it. It is possible for user to specify a * logical slot name in standby_slot_names just before the server * startup. The GUC check_hook(validate_standby_slots) can not * validate such a slot during startup as the ReplicationSlotCtl * shared memory is not initialized by that time. It is also * possible for user to drop an existing physical slot and * recreate a logical slot with the same name. */ 3. Wait for physical standby to confirm receiving the given lsn standby -->standbys 4. In StandbyConfirmedFlush(), is it better to have below errdetail in all problematic cases: Logical replication is waiting on the standby associated with \"%s\ We have it only for inactive pid case but we are waiting in all cases. thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Feb 28, 2024 at 4:51 PM Amit Kapila wrote: > > On Wed, Feb 28, 2024 at 3:26 PM shveta malik wrote: > > > > > > Here is the patch which addresses the above comments. Also optimized > > the test a little bit. Now we use pg_sync_replication_slots() function > > instead of worker to test the operator-redirection using search-patch. > > This has been done to simplify the test case and reduce the added > > time. > > > > I have slightly adjusted the comments in the attached, otherwise, LGTM. This patch was pushed (commit: b3f6b14) and it resulted in BF failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-02-29%2012%3A49%3A27 The concerned log on standby1: 2024-02-29 14:23:16.738 UTC [3908:4] 040_standby_failover_slots_sync.pl LOG: statement: SELECT pg_sync_replication_slots(); The system cannot find the file specified. 2024-02-29 14:23:16.971 UTC [3908:5] 040_standby_failover_slots_sync.pl ERROR: could not connect to the primary server: connection to server at "127.0.0.1", port 65352 failed: FATAL: SSPI authentication failed for user "repl_role" 2024-02-29 14:23:16.971 UTC [3908:6] 040_standby_failover_slots_sync.pl STATEMENT: SELECT pg_sync_replication_slots(); It seems authentication is failing for the new role added.We also see method=sspi used in the publisher log. We are analysing it further and will share the findings. thanks Shveta
Add comment to specify timeout unit in ConditionVariableTimedSleep()
Hi hackers, ConditionVariableTimedSleep() accepts a timeout parameter, but it doesn't explicitly state the unit for the timeout anywhere. To determine this, one needs to look into the details of the function to find it out from the comments of the internally called function WaitLatch(). It would be beneficial to include a comment in the header of ConditionVariableTimedSleep() specifying that the timeout is in milliseconds, similar to what we have for other non-static functions like WaitLatch and WaitEventSetWait. Attached the patch for the same. thanks Shveta v1-0001-Specify-timeout-unit-in-ConditionVariableTimedSle.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Tue, Mar 5, 2024 at 9:15 AM Amit Kapila wrote: > > On Tue, Mar 5, 2024 at 6:10 AM Peter Smith wrote: > > > > == > > src/backend/replication/walsender.c > > > > 5. NeedToWaitForWal > > > > + /* > > + * Check if the standby slots have caught up to the flushed position. It > > + * is good to wait up to the flushed position and then let the WalSender > > + * send the changes to logical subscribers one by one which are already > > + * covered by the flushed position without needing to wait on every change > > + * for standby confirmation. > > + */ > > + if (NeedToWaitForStandbys(flushed_lsn, wait_event)) > > + return true; > > + > > + *wait_event = 0; > > + return false; > > +} > > + > > > > 5a. > > The comment (or part of it?) seems misplaced because it is talking > > WalSender sending changes, but that is not happening in this function. > > > > I don't think so. This is invoked only by walsender and a static > function. I don't see any other better place to mention this. > > > Also, isn't what this is saying already described by the other comment > > in the caller? e.g.: > > > > Oh no, here we are explaining the wait order. I think there is a scope of improvement here. The comment inside NeedToWaitForWal() which states that we need to wait here for standbys on flush-position(and not on each change) should be outside of this function. It is too embedded. And the comment which states the order of wait (first flush and then standbys confirmation) should be outside the for-loop in WalSndWaitForWal(), but yes we do need both the comments. Attached a patch (.txt) for comments improvement, please merge if appropriate. thanks Shveta From e03332839dd804f3bc38937e677ba87ac10f981b Mon Sep 17 00:00:00 2001 From: Shveta Malik Date: Tue, 5 Mar 2024 11:29:52 +0530 Subject: [PATCH v2] Comments improvement. --- src/backend/replication/walsender.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 96f44ba85d..6a082b42eb 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1799,13 +1799,6 @@ NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn, return true; } - /* -* Check if the standby slots have caught up to the flushed position. It -* is good to wait up to the flushed position and then let the WalSender -* send the changes to logical subscribers one by one which are already -* covered by the flushed position without needing to wait on every change -* for standby confirmation. -*/ if (NeedToWaitForStandbys(flushed_lsn, wait_event)) return true; @@ -1818,7 +1811,10 @@ NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn, * * If the walsender holds a logical slot that has enabled failover, we also * wait for all the specified streaming replication standby servers to - * confirm receipt of WAL up to RecentFlushPtr. + * confirm receipt of WAL up to RecentFlushPtr. It's beneficial to wait + * here for the confirmation from standbys up to RecentFlushPtr rather + * than waiting in ReorderBufferCommit() before transmitting each + * change to logical subscribers, which is already covered in RecentFlushPtr. * * Returns end LSN of flushed WAL. Normally this will be >= loc, but if we * detect a shutdown request (either from postmaster or client) we will return @@ -1847,6 +1843,12 @@ WalSndWaitForWal(XLogRecPtr loc) else RecentFlushPtr = GetXLogReplayRecPtr(NULL); + /* +* In the loop, we wait for the necessary WALs to be flushed to disk +* initially, and then we wait for standbys to catch up. Upon receiving +* the shutdown signal, we immediately transition to waiting for standbys +* to catch up. +*/ for (;;) { boolwait_for_standby_at_stop = false; @@ -1877,12 +1879,10 @@ WalSndWaitForWal(XLogRecPtr loc) XLogBackgroundFlush(); /* -* Within the loop, we wait for the necessary WALs to be flushed to -* disk first, followed by waiting for standbys to catch up if there -* are enough WALs or upon receiving the shutdown signal. To avoid the -* scenario where standbys need to catch up to a newer WAL location in -* each iteration, we update our idea of the currently flushed -* position only if we are not waiting for standbys to catch up. +* Update our idea of the currently flushed position, but do it only +* if we haven'
Re: Missing LWLock protection in pgstat_reset_replslot()
On Tue, Mar 5, 2024 at 1:25 PM Heikki Linnakangas wrote: > SearchNamedReplicationSlot() will also acquire the lock in LW_SHARED > mode, when you pass need_lock=true. So that at least should be changed > to false. > Also don't we need to release the lock when we return here: /* * Nothing to do for physical slots as we collect stats only for logical * slots. */ if (SlotIsPhysical(slot)) return; thanks Shveta
Re: Missing LWLock protection in pgstat_reset_replslot()
On Tue, Mar 5, 2024 at 6:52 PM Bertrand Drouvot wrote: > > > /* > > * Nothing to do for physical slots as we collect stats only for logical > > * slots. > > */ > > if (SlotIsPhysical(slot)) > > return; > > D'oh! Thanks! Fixed in v2 shared up-thread. Thanks. Can we try to get rid of multiple LwLockRelease in pgstat_reset_replslot(). Is this any better? /* -* Nothing to do for physical slots as we collect stats only for logical -* slots. +* Reset stats if it is a logical slot. Nothing to do for physical slots +* as we collect stats only for logical slots. */ - if (SlotIsPhysical(slot)) - { - LWLockRelease(ReplicationSlotControlLock); - return; - } - - /* reset this one entry */ - pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid, -ReplicationSlotIndex(slot)); + if (SlotIsLogical(slot)) + pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid, +ReplicationSlotIndex(slot)); LWLockRelease(ReplicationSlotControlLock); Something similar in pgstat_fetch_replslot() perhaps? thanks Shveta
Re: Synchronizing slots from primary to standby
On Wed, Mar 6, 2024 at 6:54 PM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada > > > wrote: > > > > > > Hi, > > > > > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) > > > > > > > > wrote: > > > > > > > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > > > > > > > > > --- > > > > > > +void > > > > > > +assign_standby_slot_names(const char *newval, void *extra) { > > > > > > +List *standby_slots; > > > > > > +MemoryContext oldcxt; > > > > > > +char *standby_slot_names_cpy = extra; > > > > > > + > > > > > > > > > > > > Given that the newval and extra have the same data > > > > > > (standby_slot_names value), why do we not use newval instead? I > > > > > > think that if we use newval, we don't need to guc_strdup() in > > > > > > check_standby_slot_names(), we might need to do list_copy_deep() > > > > > > instead, though. It's not clear to me as there is no comment. > > > > > > > > > > I think SplitIdentifierString will modify the passed in string, so > > > > > we'd better not pass the newval to it, otherwise the stored guc > > > > > string(standby_slot_names) will be changed. I can see we are doing > > > > > similar thing in other GUC check/assign function as well. > > > > > (check_wal_consistency_checking/ assign_wal_consistency_checking, > > > > > check_createrole_self_grant/ assign_createrole_self_grant ...). > > > > > > > > Why does it have to be a List in the first place? > > > > > > I thought the List type is convenient to use here, as we have existing > > > list build function(SplitIdentifierString), and have convenient list > > > macro to loop the > > > list(foreach_ptr) which can save some codes. > > > > > > > In earlier version patches, we > > > > used to copy the list and delete the element until it became empty, > > > > while waiting for physical wal senders. But we now just refer to > > > > each slot name in the list. The current code assumes that > > > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it > > > > changes, it will silently get broken. I think we can check and > > > > assign standby_slot_names in a similar way to > > > > check/assign_temp_tablespaces and > > check/assign_synchronous_standby_names. > > > > > > Yes, we could do follow it by allocating an array and copy each slot > > > name into it, but it also requires some codes to build and scan the > > > array. So, is it possible to expose the GucMemorycontext or have an API > > > like > > guc_copy_list instead ? > > > If we don't want to touch the guc api, I am ok with using an array as > > > well. > > > > I rethink about this and realize that it's not good to do the memory > > allocation in > > assign hook function. As the "src/backend/utils/misc/README" said, we'd > > better do that in check hook function and pass it via extra to assign hook > > function. And thus array is a good choice in this case rather than a List > > which > > cannot be passed to *extra. > > > > Here is the V107 patch set which parse and cache the standby slot names in > > an > > array instead of a List. > > The patch needs to be rebased due to recent commit. > > Attach the V107_2 path set. There are no code changes in this version. The patch needed to be rebased due to a recent commit. Attached v107_3, there are no code changes in this version. thanks Shveta v107_3-0002-Document-the-steps-to-check-if-the-standby-is.patch Description: Binary data v107_3-0001-Allow-logical-walsenders-to-wait-for-the-phys.patch Description: Binary data
Re: Missing LWLock protection in pgstat_reset_replslot()
On Wed, Mar 6, 2024 at 2:36 PM Bertrand Drouvot wrote: > > Hi, > > On Wed, Mar 06, 2024 at 10:24:46AM +0530, shveta malik wrote: > > On Tue, Mar 5, 2024 at 6:52 PM Bertrand Drouvot > > wrote: > > Thanks. Can we try to get rid of multiple LwLockRelease in > > pgstat_reset_replslot(). Is this any better? > > > > /* > > -* Nothing to do for physical slots as we collect stats only for > > logical > > -* slots. > > +* Reset stats if it is a logical slot. Nothing to do for physical > > slots > > +* as we collect stats only for logical slots. > > */ > > - if (SlotIsPhysical(slot)) > > - { > > - LWLockRelease(ReplicationSlotControlLock); > > - return; > > - } > > - > > - /* reset this one entry */ > > - pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid, > > -ReplicationSlotIndex(slot)); > > + if (SlotIsLogical(slot)) > > + pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid, > > +ReplicationSlotIndex(slot)); > > > > LWLockRelease(ReplicationSlotControlLock); > > > > Yeah, it's easier to read and probably reduce the pgstat_replslot.o object > file > size a bit for non optimized build. > > > Something similar in pgstat_fetch_replslot() perhaps? > > Yeah, all of the above done in v3 attached. > Thanks for the patch. For the fix in pgstat_fetch_replslot(), even with the lock in fetch call, there are chances that the concerned slot can be dropped and recreated. --It can happen in a small window in pg_stat_get_replication_slot() when we are consuming the return values of pgstat_fetch_replslot (using slotent). --Also it can happen at a later stage when we have switched to fetching the next slot (obtained from 'pg_replication_slots' through view' pg_stat_replication_slots'), the previous one can be dropped. Ultimately the results of system view 'pg_replication_slots' can still give us non-existing or re-created slots. But yes, I do not deny that it gives us better consistency protection. Do you feel that the lock in pgstat_fetch_replslot() should be moved to its caller when we are done copying the results of that slot? This will improve the protection slightly. thanks Shveta
Re: Missing LWLock protection in pgstat_reset_replslot()
On Thu, Mar 7, 2024 at 11:12 AM Michael Paquier wrote: > > Yeah, it is possible that what you retrieve from > pgstat_fetch_replslot() does not refer exactly to the slot's content > under concurrent activity, but you cannot protect a single scan of > pg_stat_replication_slots as of an effect of its design: > pg_stat_get_replication_slot() has to be called multiple times. The > patch at least makes sure that the copy of the slot's stats retrieved > by pgstat_fetch_entry() is slightly more consistent Right, I understand that. , but you cannot do > better than that except if the data retrieved from > pg_replication_slots and its stats are fetched in the same context > function call, holding the replslot LWLock for the whole scan > duration. Yes, agreed. > > > Do you feel that the lock in pgstat_fetch_replslot() should be moved > > to its caller when we are done copying the results of that slot? This > > will improve the protection slightly. > > I don't see what extra protection this would offer, as > pg_stat_get_replication_slot() is called once for each slot. Feel > free to correct me if I'm missing something. It slightly improves the chances. pgstat_fetch_replslot is only called from pg_stat_get_replication_slot(), I thought it might be better to acquire lock before we call pgstat_fetch_replslot and release once we are done copying the results for that particular slot. But I also understand that it will not prevent someone from dropping that slot at a later stage when the rest of the calls of pg_stat_get_replication_slot() are still pending. So I am okay with the current one. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian wrote: > >> Pushed with minor modifications. I'll keep an eye on BF. >> >> BTW, one thing that we should try to evaluate a bit more is the >> traversal of slots in StandbySlotsHaveCaughtup() where we verify if >> all the slots mentioned in standby_slot_names have received the >> required WAL. Even if the standby_slot_names list is short the total >> number of slots can be much larger which can lead to an increase in >> CPU usage during traversal. There is an optimization that allows to >> cache ss_oldest_flush_lsn and ensures that we don't need to traverse >> the slots each time so it may not hit frequently but still there is a >> chance. I see it is possible to further optimize this area by caching >> the position of each slot mentioned in standby_slot_names in >> replication_slots array but not sure whether it is worth. >> >> > > I tried to test this by configuring a large number of logical slots while > making sure the standby slots are at the end of the array and checking if > there was any performance hit in logical replication from these searches. > Thanks Ajin and Nisha. We also plan: 1) Redoing XLogSendLogical time-log related test with 'sync_replication_slots' enabled. 2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup() for a large number of slots. 3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the report when there are a large number of slots to traverse. thanks Shveta