On Mon, Jun 24, 2024 at 4:01 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote:
> Hi, > > On Mon, Jun 17, 2024 at 5:55 PM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > Here are my thoughts on when to do the XID age invalidation. In all > > the patches sent so far, the XID age invalidation happens in two > > places - one during the slot acquisition, and another during the > > checkpoint. As the suggestion is to do it during the vacuum (manual > > and auto), so that even if the checkpoint isn't happening in the > > database for whatever reasons, a vacuum command or autovacuum can > > invalidate the slots whose XID is aged. > > > > An idea is to check for XID age based invalidation for all the slots > > in ComputeXidHorizons() before it reads replication_slot_xmin and > > replication_slot_catalog_xmin, and obviously before the proc array > > lock is acquired. A potential problem with this approach is that the > > invalidation check can become too aggressive as XID horizons are > > computed from many places. > > > > Another idea is to check for XID age based invalidation for all the > > slots in higher levels than ComputeXidHorizons(), for example in > > vacuum() which is an entry point for both vacuum command and > > autovacuum. This approach seems similar to vacuum_failsafe_age GUC > > which checks each relation for the failsafe age before vacuum gets > > triggered on it. > > I am attaching the patches implementing the idea of invalidating > replication slots during vacuum when current slot xmin limits > (procArray->replication_slot_xmin and > procArray->replication_slot_catalog_xmin) are aged as per the new XID > age GUC. When either of these limits are aged, there must be at least > one replication slot that is aged, because the xmin limits, after all, > are the minimum of xmin or catalog_xmin of all replication slots. In > this approach, the new XID age GUC will help vacuum when needed, > because the current slot xmin limits are recalculated after > invalidating replication slots that are holding xmins for longer than > the age. The code is placed in vacuum() which is common for both > vacuum command and autovacuum, and gets executed only once every > vacuum cycle to not be too aggressive in invalidating. > > However, there might be some concerns with this approach like the > following: > 1) Adding more code to vacuum might not be acceptable > 2) What if invalidation of replication slots emits an error, will it > block vacuum forever? Currently, InvalidateObsoleteReplicationSlots() > is also called as part of the checkpoint, and emitting ERRORs from > within is avoided already. Therefore, there is no concern here for > now. > 3) What if there are more replication slots to be invalidated, will it > delay the vacuum? If yes, by how much? <<TODO>> > 4) Will the invalidation based on just current replication slot xmin > limits suffice irrespective of vacuum cutoffs? IOW, if the replication > slots are invalidated but vacuum isn't going to do any work because > vacuum cutoffs are not yet met? Is the invalidation work wasteful > here? > 5) Is it okay to take just one more time the proc array lock to get > current replication slot xmin limits via > ProcArrayGetReplicationSlotXmin() once every vacuum cycle? <<TODO>> > 6) Vacuum command can't be run on the standby in recovery. So, to help > invalidate replication slots on the standby, I have for now let the > checkpointer also do the XID age based invalidation. I know > invalidating both in checkpointer and vacuum may not be a great idea, > but I'm open to thoughts. > > Following are some of the alternative approaches which IMHO don't help > vacuum when needed: > a) Let the checkpointer do the XID age based invalidation, and call it > out in the documentation that if the checkpoint doesn't happen, the > new GUC doesn't help even if the vacuum is run. This has been the > approach until v40 patch. > b) Checkpointer and/or other backends add an autovacuum work item via > AutoVacuumRequestWork(), and autovacuum when it gets to it will > invalidate the replication slots. But, what to do for the vacuum > command here? > > Please find the attached v41 patches implementing the idea of vacuum > doing the invalidation. > > Thoughts? > > Thanks to Sawada-san for a detailed off-list discussion. > The patch no longer applies on HEAD, please rebase. regards, Ajin Cherian Fujitsu Australia