Hi, On Fri, Dec 13, 2024 at 11:02:53AM +0900, Michael Paquier wrote: > On Thu, Dec 12, 2024 at 02:02:38PM +0000, Bertrand Drouvot wrote: > > Anyway, isn't it possible that this lookup loop finishes by finding > nothing depending on concurrent updates of other beentries? It sounds > to me that this warrants an early exit in the function.
Right, done that way in the attached. > Perhaps, yes. pgstat_tracks_io_bktype() has always been discarded > walwriters since pgstat_io.c exists. Yeap. The comment on top of pgstat_tracks_io_bktype() says that it's not done "for now". I think that we could update the code as proposed until it's done. > >> + descr => 'statistics: reset collected IO statistics for a single > >> backend', > >> + proname => 'pg_stat_reset_single_backend_io_counters', provolatile => > >> 'v', > >> > >> And here, pg_stat_reset_backend_stats? > > > > Same as above, we could imagine that in the future the backend would get > > mutiple > > stats and that one would want to reset only the I/O ones for example. > > Disagreed about this part. It is slightly simpler to do a full reset > of the stats in a single entry. If another subset of stats is added > to the backend-level entries, we could always introduce a new function > that has more control over what subset of a single backend entry is > reset. And I'm pretty sure that we are going to need the function > that does the full reset anyway. Yeah, would have added it when a new stats subset would be added. It's fine by me to have it now though, so done that way. > As far as I can see, the patch relies entirely on write_to_file to > prevent any entries to be flushed out. Yes. > It means that we leave in the > dshash entries that may sit idle for as long as the server is up once > a pgproc slot is used at least once. This scales depending on > max_connections. It also means that we skip the sanity check about > dropped entries at shutdown, which may be a good thing to do because > we don't need to loop through them when writing the stats file. Agree. > Hmm. > Could it be better to be more aggressive with the handling of these > stats, marking them as dropped when their backend exists and cleanup > the dshash, without relying on the write flag to make sure that all > the entries are discarded at shutdown? Yeah we can do it to be consistent with other stats kind, done. > The point is that we do > shutdown in a controlled manner, with all backends exiting before the > checkpointer writes the stats file after the shutdown checkpoint is > completed. The patch handles things so as entries are reset when a > procnum is reused, leaving past stats around until that happens. We > should perhaps aim for more consistency with the way beentry is > refreshed and be more proactive with the backend entry drop or reset > at backend shutdown (pgstat_beshutdown_hook?), so as what is in the > dshash reflects exactly what's in shared memory for each PGPROC and > beentry. That can't be done in pgstat_beshutdown_hook because pgstat_shutdown_hook is called before and so resets the pgStatLocal.shared_hash during pgstat_detach_shmem(). So, did it in pgstat_shutdown_hook instead. > Not sure that the "_per_" added in the various references of the patch > are good to keep, like pgstat_tracks_per_backend_bktype. These could > be removed, I guess, doing also a PGSTAT_KIND_PER_BACKEND => > PGSTAT_KIND_BACKEND? Yeah makes sense, that's consistent with other kinds: done. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From 1b03df28ceea635a2687baee4f8d1b3a3c1ae728 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Mon, 28 Oct 2024 12:50:32 +0000 Subject: [PATCH v8] per backend I/O statistics While pg_stat_io provides cluster-wide I/O statistics, this commit adds the ability to track and display per backend I/O statistics. It adds a new statistics kind and 2 new functions: - pg_stat_reset_backend_stats() to be able to reset the stats for a given backend pid. - pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid. The new KIND is named PGSTAT_KIND_BACKEND as it could be used in the future to store other statistics (than the I/O ones) per backend. The new KIND is a variable-numbered one and has an automatic cap on the maximum number of entries (as its hash key contains the proc number). There is no need to write the per backend I/O stats to disk (no point to see stats for backends that do not exist anymore after a re-start), so using "write_to_file = false". Note that per backend I/O statistics are not collected for the checkpointer, the background writer, the startup process and the autovacuum launcher as those are already visible in pg_stat_io and there is only one of those. XXX: Bump catalog version needs to be done. --- doc/src/sgml/config.sgml | 8 +- doc/src/sgml/monitoring.sgml | 37 ++++ src/backend/catalog/system_functions.sql | 2 + src/backend/utils/activity/Makefile | 1 + src/backend/utils/activity/backend_status.c | 4 + src/backend/utils/activity/meson.build | 1 + src/backend/utils/activity/pgstat.c | 22 ++- src/backend/utils/activity/pgstat_backend.c | 182 +++++++++++++++++++ src/backend/utils/activity/pgstat_io.c | 25 ++- src/backend/utils/activity/pgstat_relation.c | 2 + src/backend/utils/adt/pgstatfuncs.c | 169 +++++++++++++++++ src/include/catalog/pg_proc.dat | 14 ++ src/include/pgstat.h | 31 +++- src/include/utils/pgstat_internal.h | 14 ++ src/test/regress/expected/stats.out | 72 +++++++- src/test/regress/sql/stats.sql | 38 +++- src/tools/pgindent/typedefs.list | 3 + 17 files changed, 604 insertions(+), 21 deletions(-) 9.9% doc/src/sgml/ 31.2% src/backend/utils/activity/ 22.1% src/backend/utils/adt/ 4.4% src/include/catalog/ 7.1% src/include/ 12.9% src/test/regress/expected/ 11.5% src/test/regress/sql/ diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e0c8325a39..8afca9b110 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; displayed in <link linkend="monitoring-pg-stat-database-view"> <structname>pg_stat_database</structname></link>, <link linkend="monitoring-pg-stat-io-view"> - <structname>pg_stat_io</structname></link>, in the output of - <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option - is used, in the output of <xref linkend="sql-vacuum"/> when + <structname>pg_stat_io</structname></link>, in the output of the + <link linkend="pg-stat-get-backend-io"> + <function>pg_stat_get_backend_io()</function></link> function, in the + output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> + option is used, in the output of <xref linkend="sql-vacuum"/> when the <literal>VERBOSE</literal> option is used, by autovacuum for auto-vacuums and auto-analyzes, when <xref linkend="guc-log-autovacuum-min-duration"/> is set and by diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 840d7f8161..4a9464da61 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -4790,6 +4790,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage </para></entry> </row> + <row> + <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_stat_get_backend_io</primary> + </indexterm> + <function>pg_stat_get_backend_io</function> ( <type>integer</type> ) + <returnvalue>setof record</returnvalue> + </para> + <para> + Returns I/O statistics about the backend with the specified + process ID. The output fields are exactly the same as the ones in the + <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link> + view. The function does not return I/O statistics for the checkpointer, + the background writer, the startup process and the autovacuum launcher + as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link> + view and there is only one of those. + </para></entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> @@ -4971,6 +4990,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_stat_reset_backend_stats</primary> + </indexterm> + <function>pg_stat_reset_backend_stats</function> ( <type>integer</type> ) + <returnvalue>void</returnvalue> + </para> + <para> + Resets statistics for a single backend with the specified process ID + to zero. + </para> + <para> + This function is restricted to superusers by default, but other users + can be granted EXECUTE to run the function. + </para></entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index c51dfca802..14eb99cd47 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public; REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public; +REVOKE EXECUTE ON FUNCTION pg_stat_reset_backend_stats(integer) FROM public; + REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public; REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public; diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile index b9fd66ea17..24b64a2742 100644 --- a/src/backend/utils/activity/Makefile +++ b/src/backend/utils/activity/Makefile @@ -20,6 +20,7 @@ OBJS = \ backend_status.o \ pgstat.o \ pgstat_archiver.o \ + pgstat_backend.o \ pgstat_bgwriter.o \ pgstat_checkpointer.o \ pgstat_database.o \ diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 22c6dc378c..7f74011a00 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -249,6 +249,10 @@ pgstat_beinit(void) Assert(MyProcNumber >= 0 && MyProcNumber < NumBackendStatSlots); MyBEEntry = &BackendStatusArray[MyProcNumber]; + /* Create the backend statistics entry */ + if (pgstat_tracks_backend_bktype(MyBackendType)) + pgstat_create_backend_stat(MyProcNumber); + /* Set up a process-exit hook to clean up */ on_shmem_exit(pgstat_beshutdown_hook, 0); } diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build index f73c22905c..380d3dd70c 100644 --- a/src/backend/utils/activity/meson.build +++ b/src/backend/utils/activity/meson.build @@ -5,6 +5,7 @@ backend_sources += files( 'backend_status.c', 'pgstat.c', 'pgstat_archiver.c', + 'pgstat_backend.c', 'pgstat_bgwriter.c', 'pgstat_checkpointer.c', 'pgstat_database.c', diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 18b7d9b47d..563f2443f8 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -77,6 +77,7 @@ * * Each statistics kind is handled in a dedicated file: * - pgstat_archiver.c + * - pgstat_backend.c * - pgstat_bgwriter.c * - pgstat_checkpointer.c * - pgstat_database.c @@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb, }, + [PGSTAT_KIND_BACKEND] = { + .name = "backend", + + .fixed_amount = false, + .write_to_file = false, + + .accessed_across_databases = true, + + .shared_size = sizeof(PgStatShared_Backend), + .shared_data_off = offsetof(PgStatShared_Backend, stats), + .shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats), + .pending_size = sizeof(PgStat_BackendPendingIO), + + .flush_pending_cb = pgstat_backend_flush_cb, + .reset_timestamp_cb = pgstat_backend_reset_timestamp_cb, + }, /* stats for fixed-numbered (mostly 1) objects */ @@ -602,6 +619,9 @@ pgstat_shutdown_hook(int code, Datum arg) Assert(dlist_is_empty(&pgStatPending)); dlist_init(&pgStatPending); + /* drop the backend stats entry */ + pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber); + pgstat_detach_shmem(); #ifdef USE_ASSERT_CHECKING @@ -768,7 +788,7 @@ pgstat_report_stat(bool force) partial_flush = false; - /* flush database / relation / function / ... stats */ + /* flush database / relation / function / backend / ... stats */ partial_flush |= pgstat_flush_pending_entries(nowait); /* flush of fixed-numbered stats */ diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c new file mode 100644 index 0000000000..e2d83024c2 --- /dev/null +++ b/src/backend/utils/activity/pgstat_backend.c @@ -0,0 +1,182 @@ +/* ------------------------------------------------------------------------- + * + * pgstat_backend.c + * Implementation of backend statistics. + * + * This file contains the implementation of backend statistics. It is kept + * separate from pgstat.c to enforce the line between the statistics access / + * storage implementation and the details about individual types of statistics. + * + * Copyright (c) 2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/utils/activity/pgstat_backend.c + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "utils/pgstat_internal.h" + +/* + * Returns backend's IO stats. + */ +PgStat_Backend * +pgstat_fetch_proc_stat_io(ProcNumber procNumber) +{ + PgStat_Backend *backend_entry; + + backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_BACKEND, + InvalidOid, procNumber); + + return backend_entry; +} + +/* + * Flush out locally pending backend statistics + * + * If no stats have been recorded, this function returns false. + */ +bool +pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) +{ + PgStatShared_Backend *shbackendioent; + PgStat_BackendPendingIO *pendingent; + PgStat_BktypeIO *bktype_shstats; + + if (!pgstat_lock_entry(entry_ref, nowait)) + return false; + + shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats; + bktype_shstats = &shbackendioent->stats.stats; + pendingent = (PgStat_BackendPendingIO *) entry_ref->pending; + + for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++) + { + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++) + { + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++) + { + instr_time time; + + bktype_shstats->counts[io_object][io_context][io_op] += + pendingent->counts[io_object][io_context][io_op]; + + time = pendingent->pending_times[io_object][io_context][io_op]; + + bktype_shstats->times[io_object][io_context][io_op] += + INSTR_TIME_GET_MICROSEC(time); + } + } + } + + pgstat_unlock_entry(entry_ref); + + return true; +} + +/* + * Simpler wrapper of pgstat_backend_flush_cb() + */ +void +pgstat_flush_backend(bool nowait) +{ + if (pgstat_tracks_backend_bktype(MyBackendType)) + { + PgStat_EntryRef *entry_ref; + + entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid, + MyProcNumber, false, NULL); + (void) pgstat_backend_flush_cb(entry_ref, nowait); + } +} + +/* + * Create the backend statistics entry for procnum. + */ +void +pgstat_create_backend_stat(ProcNumber procnum) +{ + PgStat_EntryRef *entry_ref; + PgStatShared_Backend *shstatent; + + entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid, + procnum, NULL); + + shstatent = (PgStatShared_Backend *) entry_ref->shared_stats; + + /* + * NB: need to accept that there might be stats from an older backend, + * e.g. if we previously used this proc number. + */ + memset(&shstatent->stats, 0, sizeof(shstatent->stats)); +} + +/* + * Find or create a local PgStat_BackendPendingIO entry for procnum. + */ +PgStat_BackendPendingIO * +pgstat_prep_backend_pending(ProcNumber procnum) +{ + PgStat_EntryRef *entry_ref; + + entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid, + procnum, NULL); + + return entry_ref->pending; +} + +/* + * Backend statistics are not collected for all BackendTypes. + * + * The following BackendTypes do not participate in the backend stats + * subsystem: + * - The same and for the same reasons as in pgstat_tracks_io_bktype(). + * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their + * I/O stats are already visible in pg_stat_io and there is only one of those. + * + * Function returns true if BackendType participates in the backend stats + * subsystem for IO and false if it does not. + * + * When adding a new BackendType, also consider adding relevant restrictions to + * pgstat_tracks_io_object() and pgstat_tracks_io_op(). + */ +bool +pgstat_tracks_backend_bktype(BackendType bktype) +{ + /* + * List every type so that new backend types trigger a warning about + * needing to adjust this switch. + */ + switch (bktype) + { + case B_INVALID: + case B_AUTOVAC_LAUNCHER: + case B_DEAD_END_BACKEND: + case B_ARCHIVER: + case B_LOGGER: + case B_WAL_RECEIVER: + case B_WAL_WRITER: + case B_WAL_SUMMARIZER: + case B_BG_WRITER: + case B_CHECKPOINTER: + case B_STARTUP: + return false; + + case B_AUTOVAC_WORKER: + case B_BACKEND: + case B_BG_WORKER: + case B_STANDALONE_BACKEND: + case B_SLOTSYNC_WORKER: + case B_WAL_SENDER: + return true; + } + + return false; +} + +void +pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts) +{ + ((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts; +} diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index f9883af2b3..e0c206a453 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -20,13 +20,7 @@ #include "storage/bufmgr.h" #include "utils/pgstat_internal.h" - -typedef struct PgStat_PendingIO -{ - PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; - instr_time pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; -} PgStat_PendingIO; - +typedef PgStat_BackendPendingIO PgStat_PendingIO; static PgStat_PendingIO PendingIOStats; static bool have_iostats = false; @@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3 Assert((unsigned int) io_op < IOOP_NUM_TYPES); Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op)); + if (pgstat_tracks_backend_bktype(MyBackendType)) + { + PgStat_PendingIO *entry_ref; + + entry_ref = pgstat_prep_backend_pending(MyProcNumber); + entry_ref->counts[io_object][io_context][io_op] += cnt; + } + PendingIOStats.counts[io_object][io_context][io_op] += cnt; have_iostats = true; @@ -148,6 +150,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op], io_time); + + if (pgstat_tracks_backend_bktype(MyBackendType)) + { + PgStat_PendingIO *entry_ref; + + entry_ref = pgstat_prep_backend_pending(MyProcNumber); + INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op], + io_time); + } } pgstat_count_io_op_n(io_object, io_context, io_op, cnt); diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index faba8b64d2..85e65557bb 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared, * VACUUM command has processed all tables and committed. */ pgstat_flush_io(false); + pgstat_flush_backend(false); } /* @@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel, /* see pgstat_report_vacuum() */ pgstat_flush_io(false); + pgstat_flush_backend(false); } /* diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index cdf37403e9..b939551d36 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1474,6 +1474,154 @@ pg_stat_get_io(PG_FUNCTION_ARGS) return (Datum) 0; } +Datum +pg_stat_get_backend_io(PG_FUNCTION_ARGS) +{ + ReturnSetInfo *rsinfo; + PgStat_Backend *backend_stats; + Datum bktype_desc; + PgStat_BktypeIO *bktype_stats; + BackendType bktype; + Datum reset_time; + int num_backends = pgstat_fetch_stat_numbackends(); + int curr_backend; + int pid; + PGPROC *proc; + ProcNumber procNumber; + + InitMaterializedSRF(fcinfo, 0); + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + + pid = PG_GETARG_INT32(0); + proc = BackendPidGetProc(pid); + + /* + * Could be an auxiliary process but would not report any stats due to + * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to + * AuxiliaryPidGetProc(). + */ + if (!proc) + return (Datum) 0; + + procNumber = GetNumberFromPGProc(proc); + backend_stats = pgstat_fetch_proc_stat_io(procNumber); + + if (!backend_stats) + return (Datum) 0; + + bktype = B_INVALID; + + /* Look for the backend type */ + for (curr_backend = 1; curr_backend <= num_backends; curr_backend++) + { + LocalPgBackendStatus *local_beentry; + PgBackendStatus *beentry; + + /* Get the next one in the list */ + local_beentry = pgstat_get_local_beentry_by_index(curr_backend); + beentry = &local_beentry->backendStatus; + + /* Looking for specific PID, ignore all the others */ + if (beentry->st_procpid != pid) + continue; + + bktype = beentry->st_backendType; + break; + } + + /* Backend is gone */ + if (bktype == B_INVALID) + return (Datum) 0; + + bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype)); + bktype_stats = &backend_stats->stats; + reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp); + + /* + * In Assert builds, we can afford an extra loop through all of the + * counters checking that only expected stats are non-zero, since it keeps + * the non-Assert code cleaner. + */ + Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype)); + + for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++) + { + const char *obj_name = pgstat_get_io_object_name(io_obj); + + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++) + { + const char *context_name = pgstat_get_io_context_name(io_context); + + Datum values[IO_NUM_COLUMNS] = {0}; + bool nulls[IO_NUM_COLUMNS] = {0}; + + /* + * Some combinations of BackendType, IOObject, and IOContext are + * not valid for any type of IOOp. In such cases, omit the entire + * row from the view. + */ + if (!pgstat_tracks_io_object(bktype, io_obj, io_context)) + continue; + + values[IO_COL_BACKEND_TYPE] = bktype_desc; + values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name); + values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name); + if (backend_stats->stat_reset_timestamp != 0) + values[IO_COL_RESET_TIME] = reset_time; + else + nulls[IO_COL_RESET_TIME] = true; + + /* + * Hard-code this to the value of BLCKSZ for now. Future values + * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant + * multipliers, once non-block-oriented IO (e.g. temporary file + * IO) is tracked. + */ + values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ); + + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++) + { + int op_idx = pgstat_get_io_op_index(io_op); + int time_idx = pgstat_get_io_time_index(io_op); + + /* + * Some combinations of BackendType and IOOp, of IOContext and + * IOOp, and of IOObject and IOOp are not tracked. Set these + * cells in the view NULL. + */ + if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op)) + { + PgStat_Counter count = + bktype_stats->counts[io_obj][io_context][io_op]; + + values[op_idx] = Int64GetDatum(count); + } + else + nulls[op_idx] = true; + + /* not every operation is timed */ + if (time_idx == IO_COL_INVALID) + continue; + + if (!nulls[op_idx]) + { + PgStat_Counter time = + bktype_stats->times[io_obj][io_context][io_op]; + + values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time)); + } + else + nulls[time_idx] = true; + } + + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, + values, nulls); + } + } + + return (Datum) 0; +} + /* * Returns statistics of WAL activity */ @@ -1779,6 +1927,27 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS) PG_RETURN_VOID(); } +Datum +pg_stat_reset_backend_stats(PG_FUNCTION_ARGS) +{ + PGPROC *proc; + int backend_pid = PG_GETARG_INT32(0); + + proc = BackendPidGetProc(backend_pid); + + /* + * Could be an auxiliary process but would not report any stats due to + * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to + * AuxiliaryPidGetProc(). + */ + if (!proc) + PG_RETURN_VOID(); + + pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc)); + + PG_RETURN_VOID(); +} + /* Reset SLRU counters (a specific one or all of them). */ Datum pg_stat_reset_slru(PG_FUNCTION_ARGS) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 0f22c21723..437157ffa3 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5913,6 +5913,15 @@ proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', prosrc => 'pg_stat_get_io' }, +{ oid => '8806', descr => 'statistics: backend IO statistics', + proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't', + provolatile => 'v', proparallel => 'r', prorettype => 'record', + proargtypes => 'int4', + proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}', + proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', + proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', + prosrc => 'pg_stat_get_backend_io' }, + { oid => '1136', descr => 'statistics: information about WAL activity', proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's', proparallel => 'r', prorettype => 'record', proargtypes => '', @@ -6052,6 +6061,11 @@ proname => 'pg_stat_reset_single_function_counters', provolatile => 'v', prorettype => 'void', proargtypes => 'oid', prosrc => 'pg_stat_reset_single_function_counters' }, +{ oid => '9987', + descr => 'statistics: reset statistics for a single backend', + proname => 'pg_stat_reset_backend_stats', provolatile => 'v', + prorettype => 'void', proargtypes => 'int4', + prosrc => 'pg_stat_reset_backend_stats' }, { oid => '2307', descr => 'statistics: reset collected statistics for a single SLRU', proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index ebfeef2f46..479773cfd2 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -49,14 +49,15 @@ #define PGSTAT_KIND_FUNCTION 3 /* per-function statistics */ #define PGSTAT_KIND_REPLSLOT 4 /* per-slot statistics */ #define PGSTAT_KIND_SUBSCRIPTION 5 /* per-subscription statistics */ +#define PGSTAT_KIND_BACKEND 6 /* per-backend statistics */ /* stats for fixed-numbered objects */ -#define PGSTAT_KIND_ARCHIVER 6 -#define PGSTAT_KIND_BGWRITER 7 -#define PGSTAT_KIND_CHECKPOINTER 8 -#define PGSTAT_KIND_IO 9 -#define PGSTAT_KIND_SLRU 10 -#define PGSTAT_KIND_WAL 11 +#define PGSTAT_KIND_ARCHIVER 7 +#define PGSTAT_KIND_BGWRITER 8 +#define PGSTAT_KIND_CHECKPOINTER 9 +#define PGSTAT_KIND_IO 10 +#define PGSTAT_KIND_SLRU 11 +#define PGSTAT_KIND_WAL 12 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL @@ -362,12 +363,23 @@ typedef struct PgStat_BktypeIO PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; } PgStat_BktypeIO; +typedef struct PgStat_BackendPendingIO +{ + PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; + instr_time pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; +} PgStat_BackendPendingIO; + typedef struct PgStat_IO { TimestampTz stat_reset_timestamp; PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; } PgStat_IO; +typedef struct PgStat_Backend +{ + TimestampTz stat_reset_timestamp; + PgStat_BktypeIO stats; +} PgStat_Backend; typedef struct PgStat_StatDBEntry { @@ -549,6 +561,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid); extern void pgstat_report_archiver(const char *xlog, bool failed); extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void); +/* + * Functions in pgstat_backend.c + */ + +extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber); +extern bool pgstat_tracks_backend_bktype(BackendType bktype); +extern void pgstat_create_backend_stat(ProcNumber procnum); /* * Functions in pgstat_bgwriter.c diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 7338bc1e28..811ed9b005 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot PgStat_StatReplSlotEntry stats; } PgStatShared_ReplSlot; +typedef struct PgStatShared_Backend +{ + PgStatShared_Common header; + PgStat_Backend stats; +} PgStatShared_Backend; /* * Central shared memory entry for the cumulative stats system. @@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats); extern void pgstat_archiver_reset_all_cb(TimestampTz ts); extern void pgstat_archiver_snapshot_cb(void); +/* + * Functions in pgstat_backend.c + */ + +extern void pgstat_flush_backend(bool nowait); + +extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum); +extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait); +extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts); /* * Functions in pgstat_bgwriter.c diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 56771f83ed..3447e7b75d 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL); (1 row) --- Test that the following operations are tracked in pg_stat_io: +-- Test that the following operations are tracked in pg_stat_io and in backend stats: -- - reads of target blocks into shared buffers -- - writes of shared buffers to permanent storage -- - extends of relations using shared buffers @@ -1259,11 +1259,19 @@ SELECT pg_stat_get_subscription_stats(NULL); -- be sure of the state of shared buffers at the point the test is run. -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL -- extends. +SELECT pid AS checkpointer_pid FROM pg_stat_activity + WHERE backend_type = 'checkpointer' \gset SELECT sum(extends) AS io_sum_shared_before_extends FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset +SELECT sum(extends) AS my_io_sum_shared_before_extends + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE context = 'normal' AND object = 'relation' \gset SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs FROM pg_stat_io WHERE object = 'relation' \gset io_sum_shared_before_ +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE object = 'relation' \gset my_io_sum_shared_before_ CREATE TABLE test_io_shared(a int); INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i; SELECT pg_stat_force_next_flush(); @@ -1280,8 +1288,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends; t (1 row) +SELECT sum(extends) AS my_io_sum_shared_after_extends + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE context = 'normal' AND object = 'relation' \gset +SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends; + ?column? +---------- + t +(1 row) + -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes --- and fsyncs. +-- and fsyncs in the global stats (not for the backend). -- See comment above for rationale for two explicit CHECKPOINTs. CHECKPOINT; CHECKPOINT; @@ -1301,6 +1318,31 @@ SELECT current_setting('fsync') = 'off' t (1 row) +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE object = 'relation' \gset my_io_sum_shared_after_ +SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes; + ?column? +---------- + t +(1 row) + +SELECT current_setting('fsync') = 'off' + OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs + AND :my_io_sum_shared_after_fsyncs= 0); + ?column? +---------- + t +(1 row) + +-- Don't return any rows if querying other backend's stats that are excluded +-- from the backend stats collection (like the checkpointer). +SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid); + ?column? +---------- + t +(1 row) + -- Change the tablespace so that the table is rewritten directly, then SELECT -- from it to cause it to be read back into shared buffers. SELECT sum(reads) AS io_sum_shared_before_reads @@ -1521,6 +1563,8 @@ SELECT pg_stat_have_stats('io', 0, 0); SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset FROM pg_stat_io \gset +SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset SELECT pg_stat_reset_shared('io'); pg_stat_reset_shared ---------------------- @@ -1535,6 +1579,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset; t (1 row) +SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset +-- pg_stat_reset_shared() did not reset backend IO stats +SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset; + ?column? +---------- + t +(1 row) + +-- but pg_stat_reset_backend_stats() does +SELECT pg_stat_reset_backend_stats(pg_backend_pid()); + pg_stat_reset_backend_stats +----------------------------- + +(1 row) + +SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset +SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset; + ?column? +---------- + t +(1 row) + -- test BRIN index doesn't block HOT update CREATE TABLE brin_hot ( id integer PRIMARY KEY, diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index 7147cc2f89..9c925005be 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL); SELECT pg_stat_get_subscription_stats(NULL); --- Test that the following operations are tracked in pg_stat_io: +-- Test that the following operations are tracked in pg_stat_io and in backend stats: -- - reads of target blocks into shared buffers -- - writes of shared buffers to permanent storage -- - extends of relations using shared buffers @@ -607,20 +607,32 @@ SELECT pg_stat_get_subscription_stats(NULL); -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL -- extends. +SELECT pid AS checkpointer_pid FROM pg_stat_activity + WHERE backend_type = 'checkpointer' \gset SELECT sum(extends) AS io_sum_shared_before_extends FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset +SELECT sum(extends) AS my_io_sum_shared_before_extends + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE context = 'normal' AND object = 'relation' \gset SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs FROM pg_stat_io WHERE object = 'relation' \gset io_sum_shared_before_ +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE object = 'relation' \gset my_io_sum_shared_before_ CREATE TABLE test_io_shared(a int); INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i; SELECT pg_stat_force_next_flush(); SELECT sum(extends) AS io_sum_shared_after_extends FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends; +SELECT sum(extends) AS my_io_sum_shared_after_extends + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE context = 'normal' AND object = 'relation' \gset +SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends; -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes --- and fsyncs. +-- and fsyncs in the global stats (not for the backend). -- See comment above for rationale for two explicit CHECKPOINTs. CHECKPOINT; CHECKPOINT; @@ -630,6 +642,17 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; SELECT current_setting('fsync') = 'off' OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs; +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_get_backend_io(pg_backend_pid()) + WHERE object = 'relation' \gset my_io_sum_shared_after_ +SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes; +SELECT current_setting('fsync') = 'off' + OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs + AND :my_io_sum_shared_after_fsyncs= 0); + +-- Don't return any rows if querying other backend's stats that are excluded +-- from the backend stats collection (like the checkpointer). +SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid); -- Change the tablespace so that the table is rewritten directly, then SELECT -- from it to cause it to be read back into shared buffers. @@ -762,10 +785,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext SELECT pg_stat_have_stats('io', 0, 0); SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset FROM pg_stat_io \gset +SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset SELECT pg_stat_reset_shared('io'); SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset FROM pg_stat_io \gset SELECT :io_stats_post_reset < :io_stats_pre_reset; +SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset +-- pg_stat_reset_shared() did not reset backend IO stats +SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset; +-- but pg_stat_reset_backend_stats() does +SELECT pg_stat_reset_backend_stats(pg_backend_pid()); +SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset +SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset; -- test BRIN index doesn't block HOT update diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index ce33e55bf1..398dd92527 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2121,6 +2121,7 @@ PgFdwSamplingMethod PgFdwScanState PgIfAddrCallback PgStatShared_Archiver +PgStatShared_Backend PgStatShared_BgWriter PgStatShared_Checkpointer PgStatShared_Common @@ -2136,6 +2137,8 @@ PgStatShared_SLRU PgStatShared_Subscription PgStatShared_Wal PgStat_ArchiverStats +PgStat_Backend +PgStat_BackendPendingIO PgStat_BackendSubEntry PgStat_BgWriterStats PgStat_BktypeIO -- 2.34.1