On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote: > On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote: >> Something that's still not quite right is that the WAL receiver and >> the WAL summarizer do not call pgstat_report_wal() at all, so we don't >> report much data and we expect these processes to run continuously. >> The location where to report stats for the WAL summarizer is simple, >> even if the system is aggressive with WAL this is never called more >> than a couple of times per seconds, like the WAL writer: > > Same as above, that sounds right after a quick look.
Attached is a patch for this set of issues for the WAL receiver, the WAL summarizer and the WAL writer. Another thing that we can do better is restrict pgstat_tracks_io_object() so as we don't report rows for non-WAL IOObject in the case of these three. Two tests are added for the WAL receiver and WAL summarizer, checking that the stats are gathered for both. For the WAL receiver, we have at least the activity coming from one WAL segment created in the init context, at least. The WAL summarizer is more pro-active with its reads in its TAP test. All that should be fixed before looking at the remaining patch for the WAL stats at backend level, so what do you think about the attached? >> I'm wondering if we should not lift more the list of processes listed >> in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER, >> B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the >> entire paragraph. Not sure if we really have to do that for this >> release, but we could look at that separately. > > hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or > update the documentation? My argument would be to make pgstat_tracks_backend_bktype() the same as pgstat_io.c, and reflect that in the docs and the comments. > hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things > that need to be called from pgstat_report_wal(). But I think that's open > door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not > related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is. OK, I can see your point here. Fine by me. -- Michael
From c1ca96900c8e719820f37408bfff83af027401b1 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Tue, 4 Mar 2025 09:15:08 +0900 Subject: [PATCH] Fix some gaps with pg_stat_wal and WAL-related processes The WAL receiver and WAL summarizer processes gain each one a call to pgstat_report_wal(), to make sure that they report their WAL statistics. pg_stat_io is adjusted so as these processes do not report any rows when the IOObject is not WAL, making the view easier to use. --- src/backend/postmaster/walsummarizer.c | 4 ++++ src/backend/replication/walreceiver.c | 10 ++++++++++ src/backend/utils/activity/pgstat_io.c | 12 +++++++++++- src/bin/pg_walsummary/t/002_blocks.pl | 7 +++++++ src/test/recovery/t/001_stream_rep.pl | 7 +++++++ 5 files changed, 39 insertions(+), 1 deletion(-) diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c index f4d61c1f3bb8..17cb6984a01d 100644 --- a/src/backend/postmaster/walsummarizer.c +++ b/src/backend/postmaster/walsummarizer.c @@ -33,6 +33,7 @@ #include "common/blkreftable.h" #include "libpq/pqsignal.h" #include "miscadmin.h" +#include "pgstat.h" #include "postmaster/auxprocess.h" #include "postmaster/interrupt.h" #include "postmaster/walsummarizer.h" @@ -1543,6 +1544,9 @@ summarizer_read_local_xlog_page(XLogReaderState *state, HandleWalSummarizerInterrupts(); summarizer_wait_for_wal(); + /* report pending statistics to the cumulative stats system */ + pgstat_report_wal(false); + /* Recheck end-of-WAL. */ latest_lsn = GetLatestLSN(&latest_tli); if (private_data->tli == latest_tli) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 82f7302ff9fd..6cb22ec2bfd7 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -583,6 +583,16 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len) */ bool requestReply = false; + /* + * Report pending statistics to the cumulative stats + * system. This location is useful for the report as it is + * not within a tight loop in the WAL receiver, which + * would bloat requests to pgstats, while also making sure + * that the reports happen at least each time a status + * update is sent. + */ + pgstat_report_wal(false); + /* * Check if time since last receive from primary has * reached the configured limit. diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index ba11545a17f3..eb5750255961 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -435,12 +435,22 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object, */ no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER || bktype == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER || - bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP; + bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP || + bktype == B_WAL_SUMMARIZER || bktype == B_WAL_WRITER || + bktype == B_WAL_RECEIVER; if (no_temp_rel && io_context == IOCONTEXT_NORMAL && io_object == IOOBJECT_TEMP_RELATION) return false; + /* + * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all + * rows for all the other objects for these. + */ + if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER || + bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL) + return false; + /* * Some BackendTypes do not currently perform any IO in certain * IOContexts, and, while it may not be inherently incorrect for them to diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl index 27f29a3b0c68..270332780a45 100644 --- a/src/bin/pg_walsummary/t/002_blocks.pl +++ b/src/bin/pg_walsummary/t/002_blocks.pl @@ -46,6 +46,13 @@ SELECT EXISTS ( EOM ok($result, "WAL summarization caught up after insert"); +# The WAL summarizer should have generated some IO statistics. +my $stats_reads = $node1->safe_psql( + 'postgres', + qq{SELECT sum(reads) > 0 FROM pg_stat_io + WHERE backend_type = 'walsummarizer' AND object = 'wal'}); +is($stats_reads, 't', "WAL summarizer generates statistics for WAL reads"); + # Find the highest LSN that is summarized on disk. my $summarized_lsn = $node1->safe_psql('postgres', <<EOM); SELECT MAX(end_lsn) AS summarized_lsn FROM pg_available_wal_summaries() diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index ee57d234c861..3945f00ab884 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -506,6 +506,13 @@ $node_standby_2->append_conf('postgresql.conf', "primary_slot_name = ''"); $node_standby_2->enable_streaming($node_primary); $node_standby_2->reload; +# The WAL receiver should have generated some IO statistics. +my $stats_reads = $node_standby_1->safe_psql( + 'postgres', + qq{SELECT sum(writes) > 0 FROM pg_stat_io + WHERE backend_type = 'walreceiver' AND object = 'wal'}); +is($stats_reads, 't', "WAL receiver generates statistics for WAL writes"); + # be sure do not streaming from cascade $node_standby_1->stop; -- 2.47.2
signature.asc
Description: PGP signature