On Wed, Feb 5, 2025 at 2:42 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Wed, Feb 5, 2025 at 10:30 AM vignesh C <vignes...@gmail.com> wrote:
> >
> > On Tue, 4 Feb 2025 at 19:56, Nisha Moond <nisha.moond...@gmail.com> wrote:
> > >
> > > Here is v69 patch set addressing above and Kuroda-san's comments in [1].
> >
> > Few minor suggestions:
> > 1) In the slot invalidation reporting below:
> > +               case RS_INVAL_IDLE_TIMEOUT:
> > +                       Assert(inactive_since > 0);
> > +
> > +                       /* translator: second %s is a GUC variable name */
> > +                       appendStringInfo(&err_detail, _("The slot's
> > idle time %s exceeds the configured \"%s\" duration."),
> > +
> > timestamptz_to_str(inactive_since),
> > +
> > "idle_replication_slot_timeout");
> > +                       /* translator: %s is a GUC variable name */
> > +                       appendStringInfo(&err_hint, _("You might need
> > to increase \"%s\"."),
> > +
> > "idle_replication_slot_timeout");
> >
> > It is logged like:
> > 2025-02-05 10:04:11.616 IST [330567] DETAIL:  The slot's idle time
> > 2025-02-05 10:02:49.131631+05:30 exceeds the configured
> > "idle_replication_slot_timeout" duration.
> >
> > Here even though we tell idle time, we are logging the inactive_since
> > value which kind of gives a wrong meaning.
> >
> > How about we change it to:
> > The slot has been inactive since 2025-02-05 10:02:49.131631+05:30,
> > which exceeds the configured "idle_replication_slot_timeout" duration.
> >
>
> Would it address your concern if we write the actual idle duration
> (now - inactive_since) instead of directly using inactive_since in the
> above message?
>

Simply using the raw timestamp difference (now - inactive_since) would
look odd. We should convert it into a user-friendly format. Since
idle_replication_slot_timeout is in minutes, we can express the
difference in minutes and seconds in the log.
For example:
DETAIL: The slot's idle time of 1 minute and 7 seconds exceeds the
configured "idle_replication_slot_timeout" duration.

This has been implemented in v70.
Thoughts?

> A few other comments:
> 1.
> + * 4. The slot is not being synced from the primary while the server
> + *    is in recovery
> + *
> + * Note that the idle timeout invalidation mechanism is not
> + * applicable for slots on the standby server that are being synced
> + * from the primary server (i.e., standby slots having 'synced' field 
> 'true').
> + * Synced slots are always considered to be inactive because they don't
> + * perform logical decoding to produce changes.
>
> The 4th point in the above comment and the rest of the comment is
> mostly saying the same thing.
>

Done. I've merged the additional info and 4th point.

> 2.
> + * Flush all replication slots to disk. Also, invalidate obsolete slots 
> during
> + * non-shutdown checkpoint.
>   *
>   * It is convenient to flush dirty replication slots at the time of 
> checkpoint.
>   * Additionally, in case of a shutdown checkpoint, we also identify the slots
> @@ -1924,6 +2007,45 @@ CheckPointReplicationSlots(bool is_shutdown)
>
> Can we try and see how the patch looks if we try to invalidate the
> slot due to idle time at the same time when we are trying to
> invalidate due to WAL?
>

I'll consider the suggested change in the next version.
~~~~

Here are the v70 patches -  addressed above and other comments in [1],
[2] and [3].

[1] 
https://www.postgresql.org/message-id/CAHut%2BPvW3pr3P3hXwBskXrDmJYKedmqRaPZcL4iLRQ51%3DXxOBw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CALDaNm0X_vgAxKPT%2Bc14yqKcgE5-x4XBdXsCAVqD6_aa-QYUvg%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAHut%2BPtCpOnifF9wnhJ%3Djo7KLmtT%3DMikuYnM9GGPTVA80rq7OA%40mail.gmail.com

--
Thanks,
Nisha

Attachment: v70-0001-Introduce-inactive_timeout-based-replication-slo.patch
Description: Binary data

Attachment: v70-0002-Add-TAP-test-for-slot-invalidation-based-on-inac.patch
Description: Binary data

Reply via email to