> On May 1, 2026, at 01:53, Masahiko Sawada <[email protected]> wrote:
> 
> On Wed, Apr 29, 2026 at 8:11 PM Chao Li <[email protected]> wrote:
>> 
>> 
>> 
>>> On Apr 29, 2026, at 09:28, Chao Li <[email protected]> wrote:
>>> 
>>> 
>>> 
>>>> On Apr 29, 2026, at 05:15, Masahiko Sawada <[email protected]> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I found a race condition issue between XLogLogicalInfo and ProcSignal
>>>> initialization while reviewing another issue[1]. I'm starting a
>>>> separate thread for the subject as it's not related to the issue
>>>> reported on that thread.
>>>> 
>>>> The issue is that child processes could miss the
>>>> PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO signal during the
>>>> initialization and end up in an inconsistent state because
>>>> InitializeProcessXLogLogicalInfo() is called (in BaseInit()) before
>>>> ProcSignalInit(). If the startup emits the signal to a process who is
>>>> between two steps, the process would not reflect the latest
>>>> XLogLogicalInfo state. I think we should move
>>>> InitializeProcessXLogLogicalInfo() after ProcSignalInit() like we do
>>>> so for InitLocalDataChecksumState().
>>> 
>>> I think this is correct.
>>> 
>>> After moving InitializeProcessXLogLogicalInfo() out of BaseInit(), 
>>> background worker processes (BackgroundWorkerMain) will no longer hold a 
>>> valid value of XLogLogicalInfo, but I guess that is fine as those processes 
>>> don’t call ProcSignalInit() anyway.
> 
> No, even after moving the InitializeProcessXLogLogicalInfo(),
> bgworkers who connected a database will call InitPostgres(),
> initializing the proc signals and XLogLogicalInfo.
> 
>> 
>> I met Zhijie Hou at HOW 2026 a few days ago. When we talked about a feature 
>> requirement I recently heard from a DBA, Zhijie pointed me to 67c20979ce 
>> (Toggle logical decoding dynamically based on logical slot presence).
>> 
>> The requirement is that storage is expensive today, and users are sensitive 
>> to the total size of WAL. In some deployments, users may only want to 
>> replicate a small set of tables intermittently, but to enable logical 
>> replication, they still have to set wal_level to logical, which 
>> significantly increases the total WAL volume. I believe this feature could 
>> help address that concern, so I reviewed the code and played a bit with it.
>> 
>> I found an issue related to this patch, so I am sharing my findings here, 
>> although the problem also exists before this patch.
>> 
>> In InitPostgres(), in the standalone backend path, StartupXLOG() is called, 
>> but XLogLogicalInfo is not updated. As a result, if we switch to standalone 
>> mode for some emergency maintenance, make data changes, and then switch back 
>> to normal mode, changes made during standalone mode would not include 
>> logical replication metadata, which may potentially break future logical 
>> replication.
>> 
>> To verify that, I did a test like:
>> 
>> * Start a new instance with wal_level = replica
>> * Create a table, insert some data, then create a logical replication slot
>> ```
>> evantest=# CREATE TABLE t1(id int);
>> CREATE TABLE
>> evantest=# INSERT INTO t1 VALUES (1), (2);
>> INSERT 0 2
>> evantest=# SELECT * FROM pg_create_logical_replication_slot('s1', 
>> 'test_decoding');
>> slot_name |    lsn
>> -----------+------------
>> s1        | 0/01D6E6D0
>> (1 row)
>> ```
>> 
>> * Stop the server, and start with standalone mode, and truncate the table:
>> ```
>> % postgres --single -F -D . evantest
>> 
>> PostgreSQL stand-alone backend 19devel
>> backend> show effective_wal_level;
>>         1: effective_wal_level (typeid = 25, len = -1, typmod = -1, byval = 
>> f)
>>        ----
>>         1: effective_wal_level = "replica"     (typeid = 25, len = -1, 
>> typmod = -1, byval = f)
>>        ----
>> backend> truncate t1;
>> backend> 2026-04-29 21:13:24.625 CST [68316] LOG:  checkpoint starting: 
>> shutdown fast
>> ```
>> 
>> * Start the server normally, and real WAL through the logical slot.
>> ```
>> evantest=# SELECT data FROM pg_logical_slot_get_changes('s1', NULL, NULL);
>>    data
>> ------------
>> BEGIN 721
>> COMMIT 721
>> (2 rows)
>> ```
>> 
>> The TRUNCATE does not appear, which I think is wrong. To fix that, we only 
>> need to call InitializeProcessXLogLogicalInfo()after StartupXLOG() in the 
>> standalone path. Since the fix is based on this patch, I added it as 0002 in 
>> this patch set.
> 
> Good catch. I've updated the patch.
> 
>> 
>> One more thought: I think this feature partially addresses the user 
>> requirement I described earlier. When wal_level is replicaand some logical 
>> slots are created, the extra WAL data should only be enabled for tables 
>> included in those slots. That avoids generating unnecessary WAL data for 
>> tables that are not targets of replication, and therefore saves storage. 
>> WDYT? Maybe a candidate for v20?
>> 
> 
> This would require additional functionality to logical replication
> slots so that they include the specific tables, and then when writing
> WAL records each backend process needs to figure out whether the table
> is included in any replication slots. While the idea sounds
> interesting, it also sounds complex and potentially introduces
> overheads.

I have realized that there is no easy or direct way to determine whether a 
table is included in some replication slot, so we may need to maintain some 
extra information. But I think this would be a feature that users and DBAs 
would like very much.

Although PG already has a mechanism to clean up old WAL files, from what I have 
heard from DBAs, many users store WAL files for years in external storage, so 
they always try to keep WAL as small as possible. I have heard repeated 
concerns that storage has become too expensive.

I am currently focusing on testing PG19 new features. After that, I can spend 
some time studying a possible solution. At that time, your help and support 
would be greatly appreciated.

> 
>> BTW, in 0001, I helped fix the typos.
> 
> Thank you!
> 
> Regards,
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v3-0001-Fix-race-condition-in-XLogLogicalInfo-and-ProcSig.patch>

V3 LGTM.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to