Re: pg_upgrade: optimize replication slot caught-up check

Chao Li Thu, 08 Jan 2026 05:35:28 -0800

> On Jan 8, 2026, at 04:18, Masahiko Sawada <[email protected]> wrote:
> 
> On Tue, Jan 6, 2026 at 12:09 PM Masahiko Sawada <[email protected]> wrote:
>> 
>> On Mon, Jan 5, 2026 at 6:55 PM Chao Li <[email protected]> wrote:
>>> 
>>> 
>>> 
>>>> On Jan 6, 2026, at 02:02, Masahiko Sawada <[email protected]> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical
>>>> slots. Currently, to check if the slots are ready to be migrated, we
>>>> call binary_upgrade_logical_slot_has_caught_up() for every single
>>>> slot. This checks if there are any unconsumed WAL records. However, we
>>>> noticed a performance issue. If there are many slots (e.g., 100 or
>>>> more) or if there is a WAL backlog, checking all slots one by one
>>>> takes a long time.
>>>> 
>>>> Here are some test results from my environment:
>>>> With an empty cluster: 1.55s
>>>> With 200 slots and 30MB backlog: 15.51s
>>>> 
>>>> Commit 6d3d2e8e5 introduced parallel checks per database, but a single
>>>> job might still have to check too many slots, causing delays.
>>>> 
>>>> Since binary_upgrade_logical_slot_has_caught_up() essentially checks
>>>> if any decodable record exists in the database, IIUC it is not
>>>> necessary to check every slot. We can optimize this by checking only
>>>> the slot with the minimum confirmed_flush_lsn. If that slot is caught
>>>> up, we can assume others are too. The attached patch implements this
>>>> optimization. With the patch, the test with 200 slots finished in
>>>> 2.512s. The execution time is now stable regardless of the number of
>>>> slots.
>>>> 
>>>> One thing to note is that DecodeTXNNeedSkip() also considers
>>>> replication origin filters. Theoretically, a plugin could filter out
>>>> specific origins, which might lead to different results. However, this
>>>> is a very rare case. Even if it happens, it would just result in a
>>>> false positive (the upgrade fails safely), so the impact is minimal.
>>>> Therefore, the patch simplifies the check to be per-database instead
>>>> of per-slot.
>>>> 
>>>> Feedback is very welcome.
>>>> 
>>>> Regards,
>>>> 
>>>> --
>>>> Masahiko Sawada
>>>> Amazon Web Services: https://aws.amazon.com
>>>> <v1-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>
>>> 
>>> Hi Masahiko-san,
>>> 
>>> Basically I think the optimization idea makes sense to me. Reducing the 
>>> check from O(slots × WAL) to O(databases × WAL) sounds a win. Just a few 
>>> small comments:
>> 
>> Thank you for reviewing the patch!
>> 
>>> 
>>> 1
>>> ```
>>> +static void process_old_cluter_logical_slot_catchup_infos(DbInfo *dbinfo, 
>>> PGresult *res, void *arg);
>>> ```
>>> 
>>> Typo in the function name: cluter->cluster
>>> 
>>> 2
>>> ···
>>> -               logical_slot_infos_query = 
>>> get_old_cluster_logical_slot_infos_query();
>>> +               const char *slot_info_query = "SELECT slot_name, plugin, 
>>> two_phase, failover, “
>>> ···
>>> 
>>> logical_slot_infos_query is no longer used, should be removed.
>> 
>> Fixed.
>> 
>>> 
>>> 3
>>> ```
>>> +                               "  ORDER BY confirmed_flush_lsn ASC "
>>> ```
>>> 
>>> I am thinking, if confirmed_flush_lsn can be NULL? If that’s true, we may 
>>> want to add “NULLS LAST”.
>> 
>> Given that the query is not executed when live-check, I think
>> confirmed_flush cannot be NULL. temporary should also not be true but
>> it's for consistency with slot_info_query.
>> 
>>> 
>>> 4
>>> ```
>>> +       Assert(num_slots == dbinfo->slot_arr.nslots);
>>> +
>>> +       for (int i = 0; i < num_slots; i++)
>>> +       {
>>> +               char       *slotname;
>>> +               bool            caught_up;
>>> +
>>> +               slotname = PQgetvalue(res, i, PQfnumber(res, "slot_name"));
>>> +               caught_up = (strcmp(PQgetvalue(res, i, PQfnumber(res, 
>>> "caught_up")), "t") == 0);
>>> +
>>> +               for (int slotnum = 0; slotnum < dbinfo->slot_arr.nslots; 
>>> slotnum++)
>>> +               {
>>> ```
>>> 
>>> As num_slots == dbinfo->slot_arr.nslots, this is still a O(num_slots^2) 
>>> check. Given this patch’s goal is to eliminate the burden from large number 
>>> of slots, maybe we can build a hash table to make the check faster.
>>> 
>> 
>> Or if we sort both queries in the same order we can simply update the
>> slots sequentially.
>> 
>> One problem I can see in this approach is that we could wrongly report
>> the invalid slots in invalid_logical_slots.txt. Even if the slot with
>> the oldest confirmed_flush_lsn has unconsumed decodable records, other
>> slots might have already been caught up. I'll think of how to deal
>> with this problem.
> 
> I came up with a simple idea; instead of
> binary_upgrade_logical_slot_has_caught_up() returning true/false, we
> can have it return the last decodable WAL record's LSN, and consider
> logical slots as caught-up if its confirmed_flush_lsn already passed
> that LSN. This way, we can keep scanning at most one logical slot per
> database and reporting the same contents as today. The function nwo
> needs to scan all WAL backlogs but it's still much better than the
> current method.
> 

Sounds better, but I need more time to look into the detail. I will spend time 
on this tomorrow.

> I've implemented this idea in the v2 patch with some updates:
> 
> - incorporated the review comments.
> - added logic to support PG18 or older.
> - added regression tests.
> 
> Feedback is very welcome.
> 
> Regards,
> 
> -- 
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v2-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Re: pg_upgrade: optimize replication slot caught-up check

Reply via email to