> On May 1, 2026, at 01:53, Masahiko Sawada <[email protected]> wrote:
>
> On Wed, Apr 29, 2026 at 8:11 PM Chao Li <[email protected]> wrote:
>>
>>
>>
>>> On Apr 29, 2026, at 09:28, Chao Li <[email protected]> wrote:
>>>
>>>
>>>
>>>> On Apr 29, 2026, at 05:15, Masahiko Sawada <[email protected]> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I found a race condition issue between XLogLogicalInfo and ProcSignal
>>>> initialization while reviewing another issue[1]. I'm starting a
>>>> separate thread for the subject as it's not related to the issue
>>>> reported on that thread.
>>>>
>>>> The issue is that child processes could miss the
>>>> PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO signal during the
>>>> initialization and end up in an inconsistent state because
>>>> InitializeProcessXLogLogicalInfo() is called (in BaseInit()) before
>>>> ProcSignalInit(). If the startup emits the signal to a process who is
>>>> between two steps, the process would not reflect the latest
>>>> XLogLogicalInfo state. I think we should move
>>>> InitializeProcessXLogLogicalInfo() after ProcSignalInit() like we do
>>>> so for InitLocalDataChecksumState().
>>>
>>> I think this is correct.
>>>
>>> After moving InitializeProcessXLogLogicalInfo() out of BaseInit(),
>>> background worker processes (BackgroundWorkerMain) will no longer hold a
>>> valid value of XLogLogicalInfo, but I guess that is fine as those processes
>>> don’t call ProcSignalInit() anyway.
>
> No, even after moving the InitializeProcessXLogLogicalInfo(),
> bgworkers who connected a database will call InitPostgres(),
> initializing the proc signals and XLogLogicalInfo.
>
>>
>> I met Zhijie Hou at HOW 2026 a few days ago. When we talked about a feature
>> requirement I recently heard from a DBA, Zhijie pointed me to 67c20979ce
>> (Toggle logical decoding dynamically based on logical slot presence).
>>
>> The requirement is that storage is expensive today, and users are sensitive
>> to the total size of WAL. In some deployments, users may only want to
>> replicate a small set of tables intermittently, but to enable logical
>> replication, they still have to set wal_level to logical, which
>> significantly increases the total WAL volume. I believe this feature could
>> help address that concern, so I reviewed the code and played a bit with it.
>>
>> I found an issue related to this patch, so I am sharing my findings here,
>> although the problem also exists before this patch.
>>
>> In InitPostgres(), in the standalone backend path, StartupXLOG() is called,
>> but XLogLogicalInfo is not updated. As a result, if we switch to standalone
>> mode for some emergency maintenance, make data changes, and then switch back
>> to normal mode, changes made during standalone mode would not include
>> logical replication metadata, which may potentially break future logical
>> replication.
>>
>> To verify that, I did a test like:
>>
>> * Start a new instance with wal_level = replica
>> * Create a table, insert some data, then create a logical replication slot
>> ```
>> evantest=# CREATE TABLE t1(id int);
>> CREATE TABLE
>> evantest=# INSERT INTO t1 VALUES (1), (2);
>> INSERT 0 2
>> evantest=# SELECT * FROM pg_create_logical_replication_slot('s1',
>> 'test_decoding');
>> slot_name | lsn
>> -----------+------------
>> s1 | 0/01D6E6D0
>> (1 row)
>> ```
>>
>> * Stop the server, and start with standalone mode, and truncate the table:
>> ```
>> % postgres --single -F -D . evantest
>>
>> PostgreSQL stand-alone backend 19devel
>> backend> show effective_wal_level;
>> 1: effective_wal_level (typeid = 25, len = -1, typmod = -1, byval =
>> f)
>> ----
>> 1: effective_wal_level = "replica" (typeid = 25, len = -1,
>> typmod = -1, byval = f)
>> ----
>> backend> truncate t1;
>> backend> 2026-04-29 21:13:24.625 CST [68316] LOG: checkpoint starting:
>> shutdown fast
>> ```
>>
>> * Start the server normally, and real WAL through the logical slot.
>> ```
>> evantest=# SELECT data FROM pg_logical_slot_get_changes('s1', NULL, NULL);
>> data
>> ------------
>> BEGIN 721
>> COMMIT 721
>> (2 rows)
>> ```
>>
>> The TRUNCATE does not appear, which I think is wrong. To fix that, we only
>> need to call InitializeProcessXLogLogicalInfo()after StartupXLOG() in the
>> standalone path. Since the fix is based on this patch, I added it as 0002 in
>> this patch set.
>
> Good catch. I've updated the patch.
>
>>
>> One more thought: I think this feature partially addresses the user
>> requirement I described earlier. When wal_level is replicaand some logical
>> slots are created, the extra WAL data should only be enabled for tables
>> included in those slots. That avoids generating unnecessary WAL data for
>> tables that are not targets of replication, and therefore saves storage.
>> WDYT? Maybe a candidate for v20?
>>
>
> This would require additional functionality to logical replication
> slots so that they include the specific tables, and then when writing
> WAL records each backend process needs to figure out whether the table
> is included in any replication slots. While the idea sounds
> interesting, it also sounds complex and potentially introduces
> overheads.
I have realized that there is no easy or direct way to determine whether a
table is included in some replication slot, so we may need to maintain some
extra information. But I think this would be a feature that users and DBAs
would like very much.
Although PG already has a mechanism to clean up old WAL files, from what I have
heard from DBAs, many users store WAL files for years in external storage, so
they always try to keep WAL as small as possible. I have heard repeated
concerns that storage has become too expensive.
I am currently focusing on testing PG19 new features. After that, I can spend
some time studying a possible solution. At that time, your help and support
would be greatly appreciated.
>
>> BTW, in 0001, I helped fix the typos.
>
> Thank you!
>
> Regards,
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v3-0001-Fix-race-condition-in-XLogLogicalInfo-and-ProcSig.patch>
V3 LGTM.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/