On Mon, Mar 2, 2026 at 11:44 PM Fujii Masao <[email protected]> wrote: > With the patch applied, I set up a logical replication and inserted a row > every > second. Even with continuous inserts, NULL was shown in the lag columns of > pg_stat_replication. That makes me wonder whether the patch's approach is > sufficient to address the issue.
Thank you for the review and testing! I had only considered the issue in the context of physical replication, but as you pointed out, my approach is insufficient for logical replication. > Relying solely on replies from the standby or subscriber seems a bit fragile > to > me. If the goal is to keep showing the last measured lag for some time, > perhaps we should introduce a rate limit on when NULL is displayed in the lag > columns? My primary goal was to ensure that the source code comments match the actual behavior, as the comment stating "the second such message must result from wal_receiver_status_interval expiring on the standby" is inaccurate. However, as you noted, the patch alone is not sufficient to fully address the issue. > For example, if there has been no activity (i.e., sentPtr == applyPtr and > applyPtr has not changed since the previous cycle) for, say, 10 seconds, > then we could allow NULL to be shown. Thought? I considered a time-based rate limit, but it is difficult to choose an appropriate threshold. Furthermore, the walsender has no way of knowing the standby's or subscriber's wal_receiver_status_interval setting. The attached v2 patch takes a different approach: it additionally requires that all reported positions (write/flush/apply) remain unchanged from the previous reply. This directly detects a truly idle system without relying on timeouts—if any position has advanced, new WAL activity must have occurred, so we should not clear the lag values even if the lag tracker is empty. -- Best regards, Shinya Kato NTT OSS Center
v2-0001-Fix-spurious-NULL-lag-in-pg_stat_replication.patch
Description: Binary data
