Hi, On Wed, Mar 05, 2025 at 05:45:57PM +0800, Xuneng Zhou wrote: > Subject: Clarification Needed on WAL Pending Checks in Patchset > > Hi, > > Thank you for the patchset. I’ve spent some time learning and reviewing it > and have 2 comments.
Thanks for looking at it! > I noticed that in patches prior to patch 6, the function > pgstat_backend_wal_have_pending() was implemented as follows: > > /* > * To determine whether any WAL activity has occurred since last time, not > * only the number of generated WAL records but also the numbers of WAL > * writes and syncs need to be checked. Because even transactions that > * generate no WAL records can write or sync WAL data when flushing the > * data pages. > */ > static bool > pgstat_backend_wal_have_pending(void) > { > PgStat_PendingWalStats pending_wal; > > pending_wal = PendingBackendStats.pending_wal; > > return pgWalUsage.wal_records != prevBackendWalUsage.wal_records || > pending_wal.wal_write != 0 || pending_wal.wal_sync != 0; > } > > Starting with patch 7, it seems the implementation was simplified to: > > /* > * To determine whether WAL usage happened. > */ > static bool > pgstat_backend_wal_have_pending(void) > { > return pgWalUsage.wal_records != prevBackendWalUsage.wal_records; > } That's right. This is due to 2421e9a51d2 that removed PgStat_PendingWalStats. > Meanwhile, the cluster-level check in the function > pgstat_wal_have_pending_cb() still performs the additional checks: > > bool > pgstat_wal_have_pending_cb(void) > { > return pgWalUsage.wal_records != prevWalUsage.wal_records || > PendingWalStats.wal_write != 0 || > PendingWalStats.wal_sync != 0; > } Not since 2421e9a51d2. It looks like that you are looking at code prior to 2421e9a51d2. > Another one is on: > Bertrand Drouvot <bertranddrouvot...@gmail.com> 于2025年3月3日周一 18:47写道: > > > Hi, > > > > On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote: > > > hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents > > things > > > that need to be called from pgstat_report_wal(). But I think that's open > > > door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is > > not > > > related to pgstat_report_wal() at all. So, I'm tempted to keep it as it > > is. > > > > I just realized that pgstat_flush_backend() is not correct in 0001. Indeed > > we check: > > > > " > > if (pg_memory_is_all_zeros(&PendingBackendStats, > > sizeof(struct PgStat_BackendPending))) > > return false; > > " > > > > but the WAL stats are not part of PgStat_BackendPending... So we only check > > for IO pending stats here. I'm not sure WAL stats could be non empty if IO > > stats are but the attached now also takes care of pending WAL stats here. > > I've noticed that here's a function for checking if there are any backend > stats waiting for flush: > /* > * Check if there are any backend stats waiting for flush. > */ > bool > pgstat_backend_have_pending_cb(void) > { > return (!pg_memory_is_all_zeros(&PendingBackendStats, > sizeof(struct PgStat_BackendPending))); > } That's right. The reason I did not add the extra check there is because I have in mind to replace the pg_memory_is_all_zeros() checks by a new global variable and also replace the checks in pgstat_flush_backend() by a call to pgstat_backend_have_pending_cb() (see 0002 in [1]). It means that all of that would be perfectly clean if 0002 in [1] goes in. But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding the extra check in the attached. Thanks for the review! [1]: https://www.postgresql.org/message-id/Z8WYf1jyy4MwOveQ%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From c9500ba1ed93f97aee2f9f4a97bcbfaaff998cad Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Mon, 6 Jan 2025 10:00:00 +0000 Subject: [PATCH v14] per backend WAL statistics Now that commit 9aea73fc61 added backend-level statistics to pgstats (and per backend IO statistics) we can more easily add per backend statistics. This commit adds per backend WAL statistics using the same layer as pg_stat_wal, except that it is now possible to know how much WAL activity is happening in each backend rather than an overall aggregate of all the activity. A function called pg_stat_get_backend_wal() is added to access this data depending on the PID of a backend. The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes are not included in this set of statistics. XXX: bump catalog version XXX: bump of stats file format not required, as backend stats do not persist on disk. --- doc/src/sgml/monitoring.sgml | 19 ++++++ src/backend/utils/activity/pgstat_backend.c | 70 ++++++++++++++++++++- src/backend/utils/activity/pgstat_wal.c | 1 + src/backend/utils/adt/pgstatfuncs.c | 26 +++++++- src/include/catalog/pg_proc.dat | 7 +++ src/include/pgstat.h | 37 +++++------ src/include/utils/pgstat_internal.h | 3 +- src/test/regress/expected/stats.out | 14 +++++ src/test/regress/sql/stats.sql | 6 ++ 9 files changed, 160 insertions(+), 23 deletions(-) 15.1% doc/src/sgml/ 42.6% src/backend/utils/activity/ 14.7% src/backend/utils/adt/ 8.3% src/include/catalog/ 4.2% src/include/utils/ 8.0% src/test/regress/expected/ 6.1% src/test/regress/sql/ diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 16646f560e8..b1710680705 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage </para></entry> </row> + <row> + <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_stat_get_backend_wal</primary> + </indexterm> + <function>pg_stat_get_backend_wal</function> ( <type>integer</type> ) + <returnvalue>record</returnvalue> + </para> + <para> + Returns WAL statistics about the backend with the specified + process ID. The output fields are exactly the same as the ones in the + <structname>pg_stat_wal</structname> view. + </para> + <para> + The function does not return WAL statistics for the checkpointer, + the background writer, the startup process and the autovacuum launcher. + </para></entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c index a9343b7b59e..01e072a9bfd 100644 --- a/src/backend/utils/activity/pgstat_backend.c +++ b/src/backend/utils/activity/pgstat_backend.c @@ -38,6 +38,14 @@ */ static PgStat_BackendPending PendingBackendStats; +/* + * WAL usage counters saved from pgWalUsage at the previous call to + * pgstat_report_wal(). This is used to calculate how much WAL usage + * happens between pgstat_report_wal() calls, by subtracting + * the previous counters from the current ones. + */ +static WalUsage prevBackendWalUsage; + /* * Utility routines to report I/O stats for backends, kept here to avoid * exposing PendingBackendStats to the outside world. @@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref) MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO)); } +/* + * To determine whether WAL usage happened. + */ +static bool +pgstat_backend_wal_have_pending(void) +{ + return pgWalUsage.wal_records != prevBackendWalUsage.wal_records; +} + +/* + * Flush out locally pending backend WAL statistics. Locking is managed + * by the caller. + */ +static void +pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref) +{ + PgStatShared_Backend *shbackendent; + PgStat_WalCounters *bktype_shstats; + WalUsage wal_usage_diff = {0}; + + /* + * This function can be called even if nothing at all has happened. Avoid + * taking lock for nothing in that case. + */ + if (!pgstat_backend_wal_have_pending()) + return; + + shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats; + bktype_shstats = &shbackendent->stats.wal_counters; + + /* + * We don't update the WAL usage portion of the local WalStats elsewhere. + * Calculate how much WAL usage counters were increased by subtracting the + * previous counters from the current ones. + */ + WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage); + +#define WALSTAT_ACC(fld, var_to_add) \ + (bktype_shstats->fld += var_to_add.fld) + WALSTAT_ACC(wal_buffers_full, wal_usage_diff); + WALSTAT_ACC(wal_records, wal_usage_diff); + WALSTAT_ACC(wal_fpi, wal_usage_diff); + WALSTAT_ACC(wal_bytes, wal_usage_diff); +#undef WALSTAT_ACC + + /* + * Save the current counters for the subsequent calculation of WAL usage. + */ + prevBackendWalUsage = pgWalUsage; +} + /* * Flush out locally pending backend statistics * @@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags) return false; if (pg_memory_is_all_zeros(&PendingBackendStats, - sizeof(struct PgStat_BackendPending))) + sizeof(struct PgStat_BackendPending)) + && !pgstat_backend_wal_have_pending()) return false; entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid, @@ -211,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags) if (flags & PGSTAT_BACKEND_FLUSH_IO) pgstat_flush_backend_entry_io(entry_ref); + if (flags & PGSTAT_BACKEND_FLUSH_WAL) + pgstat_flush_backend_entry_wal(entry_ref); + pgstat_unlock_entry(entry_ref); return false; @@ -223,7 +286,8 @@ bool pgstat_backend_have_pending_cb(void) { return (!pg_memory_is_all_zeros(&PendingBackendStats, - sizeof(struct PgStat_BackendPending))); + sizeof(struct PgStat_BackendPending)) + || pgstat_backend_wal_have_pending()); } /* @@ -258,6 +322,8 @@ pgstat_create_backend(ProcNumber procnum) pgstat_unlock_entry(entry_ref); MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending)); + + prevBackendWalUsage = pgWalUsage; } /* diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index 5d3da4b674e..16a1ecb4d90 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -52,6 +52,7 @@ pgstat_report_wal(bool force) /* flush wal stats */ (void) pgstat_wal_flush_cb(nowait); + pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL); /* flush IO stats */ pgstat_flush_io(nowait); diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 9172e1cb9d2..662ce46cbc2 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1609,8 +1609,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS) /* * pg_stat_wal_build_tuple * - * Helper routine for pg_stat_get_wal() returning one tuple based on the - * contents of wal_counters. + * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() + * returning one tuple based on the contents of wal_counters. */ static Datum pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters, @@ -1659,6 +1659,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters, PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); } +/* + * Returns WAL statistics for a backend with given PID. + */ +Datum +pg_stat_get_backend_wal(PG_FUNCTION_ARGS) +{ + int pid; + PgStat_Backend *backend_stats; + PgStat_WalCounters bktype_stats; + + pid = PG_GETARG_INT32(0); + backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL); + + if (!backend_stats) + PG_RETURN_NULL(); + + bktype_stats = backend_stats->wal_counters; + + /* save tuples with data from this PgStat_WalCounters */ + return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp)); +} + /* * Returns statistics of WAL activity */ diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index cd9422d0bac..3e35f8b8e99 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5954,6 +5954,13 @@ proargmodes => '{o,o,o,o,o}', proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}', prosrc => 'pg_stat_get_wal' }, +{ oid => '8037', descr => 'statistics: backend WAL activity', + proname => 'pg_stat_get_backend_wal', provolatile => 'v', + proparallel => 'r', prorettype => 'record', proargtypes => 'int4', + proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}', + proargmodes => '{i,o,o,o,o,o}', + proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}', + prosrc => 'pg_stat_get_backend_wal' }, { oid => '6248', descr => 'statistics: information about WAL prefetching', proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't', provolatile => 'v', prorettype => 'record', proargtypes => '', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 4aad10b0b6d..06359b9157d 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -340,24 +340,6 @@ typedef struct PgStat_IO PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; } PgStat_IO; -typedef struct PgStat_Backend -{ - TimestampTz stat_reset_timestamp; - PgStat_BktypeIO io_stats; -} PgStat_Backend; - -/* --------- - * PgStat_BackendPending Non-flushed backend stats. - * --------- - */ -typedef struct PgStat_BackendPending -{ - /* - * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO. - */ - PgStat_PendingIO pending_io; -} PgStat_BackendPending; - typedef struct PgStat_StatDBEntry { PgStat_Counter xact_commit; @@ -500,6 +482,25 @@ typedef struct PgStat_WalStats TimestampTz stat_reset_timestamp; } PgStat_WalStats; +typedef struct PgStat_Backend +{ + TimestampTz stat_reset_timestamp; + PgStat_BktypeIO io_stats; + PgStat_WalCounters wal_counters; +} PgStat_Backend; + +/* --------- + * PgStat_BackendPending Non-flushed backend stats. + * --------- + */ +typedef struct PgStat_BackendPending +{ + /* + * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO. + */ + PgStat_PendingIO pending_io; +} PgStat_BackendPending; + /* * Functions in pgstat.c */ diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 36d228e3558..d5557e6e998 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void); /* flags for pgstat_flush_backend() */ #define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */ -#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO) +#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */ +#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL) extern bool pgstat_flush_backend(bool nowait, bits32 flags); extern bool pgstat_backend_flush_cb(bool nowait); diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 093e6368dbb..b3c303c98cb 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset -- Test pg_stat_wal (and make a temp table so our temp schema exists) SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset +-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists) +SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset CREATE TEMP TABLE test_stats_temp AS SELECT 17; DROP TABLE test_stats_temp; -- Checkpoint twice: The checkpointer reports stats after reporting completion @@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal; t (1 row) +SELECT pg_stat_force_next_flush(); + pg_stat_force_next_flush +-------------------------- + +(1 row) + +SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid()); + ?column? +---------- + t +(1 row) + -- Test pg_stat_get_backend_idset() and some allied functions. -- In particular, verify that their notion of backend ID matches -- our temp schema index. diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index 0a44e14d9f4..ad3f7b7e66a 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset -- Test pg_stat_wal (and make a temp table so our temp schema exists) SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset +-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists) +SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset + CREATE TEMP TABLE test_stats_temp AS SELECT 17; DROP TABLE test_stats_temp; @@ -435,6 +438,9 @@ CHECKPOINT; SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer; SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal; +SELECT pg_stat_force_next_flush(); +SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid()); + -- Test pg_stat_get_backend_idset() and some allied functions. -- In particular, verify that their notion of backend ID matches -- our temp schema index. -- 2.34.1