On Thu, Nov 28, 2024 at 1:29 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear Nisha, > > > > > Attached v51 patch-set addressing all comments in [1] and [2]. > > > > Thanks for working on the feature! I've stated to review the patch. > Here are my comments - sorry if there are something which have already been > discussed. > The thread is too long to follow correctly. > > Comments for 0001 > ============= > > 01. binary_upgrade_logical_slot_has_caught_up > > ISTM that error_if_invalid is set to true when the slot can be moved forward, > otherwise > it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, > however, > only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I > feel > it is OK to set to true. Thought? >
Right, corrected the call with error_if_invalid as true. > Comments for 0002 > ============= > > 03. check_replication_slot_inactive_timeout > > Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade > (and also > pg_createsubscriber?) starts a server process? Several parameters have > already been > specified via -c option at that time. This can avoid an error while the > upgrading. > Note that this part is still needed even if you accept the comment. Users can > manually boot with upgrade mode. > Done. > 06. found bug > > While testing the patch, I found that slots can be invalidated too early when > when > the GUC is quite large. I think because an overflow is caused in > InvalidatePossiblyObsoleteSlot(). > > - Reproducer > > I set the replication_slot_inactive_timeout to INT_MAX and executed below > commands, > and found that the slot is invalidated. > > ``` > postgres=# SHOW replication_slot_inactive_timeout; > replication_slot_inactive_timeout > ----------------------------------- > 2147483647s > (1 row) > postgres=# SELECT * FROM pg_create_logical_replication_slot('test', > 'test_decoding'); > slot_name | lsn > -----------+----------- > test | 0/18B7F38 > (1 row) > postgres=# CHECKPOINT ; > CHECKPOINT > postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM > pg_replication_slots ; > slot_name | inactive_since | invalidation_reason > -----------+-------------------------------+--------------------- > test | 2024-11-28 07:50:25.927594+00 | inactive_timeout > (1 row) > ``` > > - analysis > > In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * > 1000 > is passed to the third argument of TimestampDifferenceExceeds(), which is > also the > integer datatype. This causes an overflow and parameter is handled as the > small > value. > > - solution > > I think there are two possible solutions. You can choose one of them: > > a. Make the maximum INT_MAX/1000, or > b. Change the unit to millisecond. > Fixed. It is reasonable to align with other timeout parameters by using milliseconds as the unit. -- Thanks, Nisha