Hi hackers,

while doing some tests for [1], I observed that $SUBJECT.

To observe this behavior on master:

1. create a logical replication slot

postgres=# SELECT * FROM pg_create_logical_replication_slot('logical_slot', 
'test_decoding', false, true);
  slot_name   |    lsn
--------------+------------
 logical_slot | 0/40749508
(1 row)

2. create a table and add some data

postgres=# create table bdt (a int);
CREATE TABLE
postgres=# insert into bdt select a from generate_series(1,10000) a ;
INSERT 0 10000

3. starts pg_recvlogical that way

pg_recvlogical -d postgres -S logical_slot -f - --no-loop --start

4. query pg_stat_io

postgres=# select backend_type,object,context,reads,read_bytes from pg_stat_io 
where backend_type = 'walsender';
 backend_type |    object     |  context  | reads | read_bytes
--------------+---------------+-----------+-------+------------
 walsender    | relation      | bulkread  |     0 |          0
 walsender    | relation      | bulkwrite |     0 |          0
 walsender    | relation      | init      |     0 |          0
 walsender    | relation      | normal    |     6 |      49152
 walsender    | relation      | vacuum    |     0 |          0
 walsender    | temp relation | normal    |     0 |          0
 walsender    | wal           | init      |       |
 walsender    | wal           | normal    |     0 |          0
(8 rows)

The non zeros stats that we see here are due to the pgstat_report_stat() call in
PostgresMain() but not to the walsender decoding activity per say (proof is that
you can see that the wal object values are empty while it certainly had to read
some WAL).

5. Once ctrl-c is done for pg_recvlogical then we get:

postgres=# select backend_type,object,context,reads,read_bytes from pg_stat_io 
where backend_type = 'walsender';
 backend_type |    object     |  context  | reads | read_bytes
--------------+---------------+-----------+-------+------------
 walsender    | relation      | bulkread  |     0 |          0
 walsender    | relation      | bulkwrite |     0 |          0
 walsender    | relation      | init      |     0 |          0
 walsender    | relation      | normal    |     9 |      73728
 walsender    | relation      | vacuum    |     0 |          0
 walsender    | temp relation | normal    |     0 |          0
 walsender    | wal           | init      |       |
 walsender    | wal           | normal    |    98 |     793856
(8 rows)

Now we can see that the numbers increased for the relation object and that we
get non zeros numbers for the wal object too (which makes fully sense).

With the attached patch applied, we would get the same numbers already in
step 4. (means the stats are flushed without the need to wait for the walsender
to exit).

Remarks:

R1. The extra flush are done in WalSndLoop(): I believe this is the right place
for them.
R2. A test is added in 035_standby_logical_decoding.pl: while this TAP test
is already racy ([2]) that looks like a good place as we don't want 
pg_recvlogical
to stop/exit.
R3. The test can also be back-patched till 16_STABLE as 
035_standby_logical_decoding.pl
has been introduced in 16 (and so do pg_stat_io).
R4. The test fails if the extra flushs are not applied/done, which makes fully
sense.

[1]: 
https://www.postgresql.org/message-id/flat/Z3zqc4o09dM/Ezyz%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: 
https://www.postgresql.org/message-id/Z6oRgmD8m7zBo732%40ip-10-97-1-34.eu-west-3.compute.internal

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From b16a0ccada3c2acc80bf733e71935677b9ab6382 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Tue, 25 Feb 2025 10:18:05 +0000
Subject: [PATCH v1] Flush the IO statistics of active walsenders

The walsender does not flush its IO statistics until it exits.
The issue is there since pg_stat_io has been introduced in a9c70b46dbe.
This commits:

1. ensures it does not wait to exit to flush its IO statistics
2. adds a test
---
 src/backend/replication/walsender.c               |  7 +++++++
 .../recovery/t/035_standby_logical_decoding.pl    | 15 +++++++++++++++
 2 files changed, 22 insertions(+)
  26.2% src/backend/replication/
  73.7% src/test/recovery/t/

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 446d10c1a7d..3664d7d0c75 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -90,6 +90,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/pgstat_internal.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
 #include "utils/timestamp.h"
@@ -2793,6 +2794,12 @@ WalSndLoop(WalSndSendDataCallback send_data)
 		if (pq_flush_if_writable() != 0)
 			WalSndShutdown();
 
+		/*
+		 * Report IO statistics
+		 */
+		pgstat_flush_io(false);
+		(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+
 		/* If nothing remains to be sent right now ... */
 		if (WalSndCaughtUp && !pq_is_send_pending())
 		{
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 8903177d883..ab34d290165 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -543,6 +543,9 @@ $node_subscriber->stop;
 reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_',
 	0, 1);
 
+# To check that an active walsender updates its IO statistics below.
+$node_standby->safe_psql('testdb', "SELECT pg_stat_reset_shared('io')");
+
 # Ensure that replication slot stats are not empty before triggering the
 # conflict.
 $node_primary->safe_psql('testdb',
@@ -552,6 +555,18 @@ $node_standby->poll_query_until('testdb',
 	qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']
 ) or die "replication slot stats of vacuum_full_activeslot not updated";
 
+# Ensure an active walsender updates its IO statistics.
+is( $node_standby->safe_psql(
+		'testdb',
+		qq(SELECT reads > 0
+		FROM pg_catalog.pg_stat_io
+		WHERE backend_type = 'walsender'
+		AND object = 'relation'
+		AND context = 'normal')
+	),
+	qq(t),
+	"Check that an active walsender updates its IO statistics");
+
 # This should trigger the conflict
 wait_until_vacuum_can_remove(
 	'full', 'CREATE TABLE conflict_test(x integer, y text);
-- 
2.34.1

Reply via email to