Hi, Thanks for looking into this.
On Mon, Aug 26, 2024 at 4:35 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > Few comments on 0001: > 1. > @@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid > > + /* > + * Skip the sync if the local slot is already invalidated. We do this > + * beforehand to avoid slot acquire and release. > + */ > > I was wondering why you have added this new check as part of this > patch. If you see the following comments in the related code, you will > know why we haven't done this previously. Removed. Can deal with optimization separately. > 2. > + */ > + InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT, > + 0, InvalidOid, InvalidTransactionId); > > Why do we want to call this for shutdown case (when is_shutdown is > true)? I understand trying to invalidate slots during regular > checkpoint but not sure if we need it at the time of shutdown. Changed it to invalidate only for non-shutdown checkpoints. inactive_timeout invalidation isn't critical for shutdown unlike wal_removed which can help shutdown by freeing up some disk space. > The > other point is can we try to check the performance impact with 100s of > slots as mentioned in the code comments? I first checked how much does the wal_removed invalidation check add to the checkpoint (see 2nd and 3rd column). I then checked how much inactive_timeout invalidation check adds to the checkpoint (see 4th column), it is not more than wal_remove invalidation check. I then checked how much the wal_removed invalidation check adds for replication slots that have already been invalidated due to inactive_timeout (see 5th column), looks like not much. | # of slots | HEAD (no invalidation) ms | HEAD (wal_removed) ms | PATCHED (inactive_timeout) ms | PATCHED (inactive_timeout+wal_removed) ms | |------------|----------------------------|-----------------------|-------------------------------|------------------------------------------| | 100 | 18.591 | 370.586 | 359.299 | 373.882 | | 1000 | 15.722 | 4834.901 | 5081.751 | 5072.128 | | 10000 | 19.261 | 59801.062 | 61270.406 | 60270.099 | Having said that, I'm okay to implement the optimization specified. Thoughts? + /* + * NB: We will make another pass over replication slots for + * invalidation checks to keep the code simple. Testing shows that + * there is no noticeable overhead (when compared with wal_removed + * invalidation) even if we were to do inactive_timeout invalidation + * of thousands of replication slots here. If it is ever proven that + * this assumption is wrong, we will have to perform the invalidation + * checks in the above for loop with the following changes: + * + * - Acquire ControlLock lock once before the loop. + * + * - Call InvalidatePossiblyObsoleteSlot for each slot. + * + * - Handle the cases in which ControlLock gets released just like + * InvalidateObsoleteReplicationSlots does. + * + * - Avoid saving slot info to disk two times for each invalidated + * slot. Please see the attached v43 patches addressing the above review comments. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v43-0002-Add-XID-age-based-replication-slot-invalidation.patch
Description: Binary data
v43-0001-Add-inactive_timeout-based-replication-slot-inva.patch
Description: Binary data