On Wed, Feb 5, 2025 at 2:42 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Feb 5, 2025 at 10:30 AM vignesh C <vignes...@gmail.com> wrote: > > > > On Tue, 4 Feb 2025 at 19:56, Nisha Moond <nisha.moond...@gmail.com> wrote: > > > > > > Here is v69 patch set addressing above and Kuroda-san's comments in [1]. > > > > Few minor suggestions: > > 1) In the slot invalidation reporting below: > > + case RS_INVAL_IDLE_TIMEOUT: > > + Assert(inactive_since > 0); > > + > > + /* translator: second %s is a GUC variable name */ > > + appendStringInfo(&err_detail, _("The slot's > > idle time %s exceeds the configured \"%s\" duration."), > > + > > timestamptz_to_str(inactive_since), > > + > > "idle_replication_slot_timeout"); > > + /* translator: %s is a GUC variable name */ > > + appendStringInfo(&err_hint, _("You might need > > to increase \"%s\"."), > > + > > "idle_replication_slot_timeout"); > > > > It is logged like: > > 2025-02-05 10:04:11.616 IST [330567] DETAIL: The slot's idle time > > 2025-02-05 10:02:49.131631+05:30 exceeds the configured > > "idle_replication_slot_timeout" duration. > > > > Here even though we tell idle time, we are logging the inactive_since > > value which kind of gives a wrong meaning. > > > > How about we change it to: > > The slot has been inactive since 2025-02-05 10:02:49.131631+05:30, > > which exceeds the configured "idle_replication_slot_timeout" duration. > > > > Would it address your concern if we write the actual idle duration > (now - inactive_since) instead of directly using inactive_since in the > above message? >
Simply using the raw timestamp difference (now - inactive_since) would look odd. We should convert it into a user-friendly format. Since idle_replication_slot_timeout is in minutes, we can express the difference in minutes and seconds in the log. For example: DETAIL: The slot's idle time of 1 minute and 7 seconds exceeds the configured "idle_replication_slot_timeout" duration. This has been implemented in v70. Thoughts? > A few other comments: > 1. > + * 4. The slot is not being synced from the primary while the server > + * is in recovery > + * > + * Note that the idle timeout invalidation mechanism is not > + * applicable for slots on the standby server that are being synced > + * from the primary server (i.e., standby slots having 'synced' field > 'true'). > + * Synced slots are always considered to be inactive because they don't > + * perform logical decoding to produce changes. > > The 4th point in the above comment and the rest of the comment is > mostly saying the same thing. > Done. I've merged the additional info and 4th point. > 2. > + * Flush all replication slots to disk. Also, invalidate obsolete slots > during > + * non-shutdown checkpoint. > * > * It is convenient to flush dirty replication slots at the time of > checkpoint. > * Additionally, in case of a shutdown checkpoint, we also identify the slots > @@ -1924,6 +2007,45 @@ CheckPointReplicationSlots(bool is_shutdown) > > Can we try and see how the patch looks if we try to invalidate the > slot due to idle time at the same time when we are trying to > invalidate due to WAL? > I'll consider the suggested change in the next version. ~~~~ Here are the v70 patches - addressed above and other comments in [1], [2] and [3]. [1] https://www.postgresql.org/message-id/CAHut%2BPvW3pr3P3hXwBskXrDmJYKedmqRaPZcL4iLRQ51%3DXxOBw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CALDaNm0X_vgAxKPT%2Bc14yqKcgE5-x4XBdXsCAVqD6_aa-QYUvg%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAHut%2BPtCpOnifF9wnhJ%3Djo7KLmtT%3DMikuYnM9GGPTVA80rq7OA%40mail.gmail.com -- Thanks, Nisha
v70-0001-Introduce-inactive_timeout-based-replication-slo.patch
Description: Binary data
v70-0002-Add-TAP-test-for-slot-invalidation-based-on-inac.patch
Description: Binary data