On Mon, Mar 2, 2026 at 4:06 PM Andres Freund <[email protected]> wrote:
Hi Andres, > On 2026-03-02 09:01:05 +0100, Jakub Wartak wrote: > > On Thu, Feb 26, 2026 at 5:13 PM Andres Freund <[email protected]> wrote: > > > > > > but I think having it in PgStat_BktypeIO is not great. This makes > > > > > > PgStat_IO 30k*BACKEND_NUM_TYPES bigger, or ~ 0.5MB. Having a stats > > > > > > snapshot > > > > > > be half a megabyte bigger for no reason seems too wasteful. > > > > > > > > > > Yea, that's not awesome. > > > > > > > > Guys, question, could You please explain me what are the drawbacks of > > > > having > > > > this semi-big (internal-only) stat snapshot of 0.5MB? I'm struggling to > > > > understand two things: > > > > a) 0.5MB is not a lot those days (ok my 286 had 1MB in the day ;)) > > > > > > I don't really agree with that, I guess. And even if I did, it's one > > > thing to > > > use 0.5MB when you actually use it, it's quite another when most of that > > > memory is never used. > > > > > > > > > With the patch, *every* backend ends up with a substantially larger > > > pgStatLocal. Before: > > > > > > nm -t d --size-sort -r -S src/backend/postgres|head -n20|less > > > (the second column is the decimal size, third the type of the symbol) > > > > > > 0000000004131808 0000000000297456 r yy_transition > > > ... > > > 0000000003916352 0000000000054744 r UnicodeDecompMain > > > 0000000021004896 0000000000052824 B pgStatLocal > > > 0000000003850592 0000000000040416 r unicode_categories > > > ... > > > > > > after: > > > 0000000023220512 0000000000329304 B pgStatLocal > > > 0000000018531648 0000000000297456 r yy_transition > > > ... > > > > > > And because pgStatLocal is zero initialized data, it'll be > > > on-demand-allocated > > > in every single backend (whereas e.g. yy_transition is read-only shared). > > > So > > > you're not talking a single time increase, you're multiplying it by the > > > numer > > > of active connections > > > > > > Now, it's true that most backend won't ever touch pgStatLocal. However, > > > most > > > backends will touch Pending[Backend]IOStats, which also increased > > > noticably: > > > > > > before: > > > 0000000021060960 0000000000002880 b PendingIOStats > > > 0000000021057792 0000000000002880 b PendingBackendStats > > > > > > after: > > > 0000000023568416 0000000000018240 b PendingIOStats > > > 0000000023549888 0000000000018240 b PendingBackendStats > > > > > > > > > Again, I think some increase here doesn't have to be fatal, but increasing > > > with mainly impossible-to-use memory seems just too much waste to mee. > > > > > > > > > This also increases the shared-memory usage of pgstats: Before it used > > > ~300kB > > > on a small system. That nearly doubles with this patch. But that's perhaps > > > less concerning, given it's per-system, rather than per-backend memory > > > usage. > > > > > > > > > > > > > b) how does it affect anything, because testing show it's not? > > > > > > Which of your testing would conceivably show the effect? The concern here > > > isn't really performance, it's that it increases our memory usage, which > > > you'd > > > only see having an effect if you are tight on memory or have a workload > > > that > > > is cache sensitive. > > > > > > > Oh ok, now I get understand the problem about pgStatLocal properly, > > thanks for detailed > > explanation! (but I'm somewhat I'm still lost a little in the woods of > > pgstat infra). Anyway, I > > agree that PgStat_IO started to be way too big especially when the > > pg_stat_io(_histogram) > > views wouldn't be really accessed. > > > > How about the attached v6-0002? It now dynamically allocates PgStat_IO > > memory to avoid > > the memory cost (only allocated if pgstat_io_snapshot_cb() is used).Is > > that the right path? And > > if so, perhaps it should allocate it from mxct > > pgStatLocal.snapshot.context instead? > > I think even the per-backend pending IO stats are too big. And for both > pending stats, stored stats and snapshots, I still don't think I am OK with > storing so many histograms that are not possible to use. I think that needs > to be fixed first. v7-0001: no changes since quite some time Memory reduction stuff (I didn't want to squash it, so for now they are separate) v7-0002: As PendingBackendStats (per individual backend IO stats) was not collecting latency buckets at all (but it was sharing the the same struct/typedef), I cloned the struct without those latency buckets. This reduces struct back again from 18240, back to 2880 bytes per backend (BSS) as on master. v7-0003: Sadly I couldn't easily make backend-local side recording inside PendingIOStats dynamically from within pgstat_count_io_op_time() on first use of specific IO traffic type, so that is for each [IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES] as any MemoryContextAlloc() from there happens to be used as part of critical sections and this blows up. It's just +15kB per backend, so I hope that is ok when we just allocate if we have really desire to use it (track_io/wal_io_timings on) -- so nm(1) reports just 2888 (so just +8b pointer). The drawback of this is that setting GUCs locally won't be effective for histogram collection immediatley, but only for newly spawned backends. This means that I had to switch it to TAP test instead, so it can be tested. I don't have strong opinion if that saving +15kB is worth it or not for users not running with track_[io/wal_io]_timings. v7-0004: (This was already sent with previous message) With orginal v5 every backend had big pgStatLocal (0000000000329304 B pgStatLocal) that was there but not used at all if pg_stat_io(_histogram) views wouldn't be really accessed. Now it is (0000000000000984 B pgStatLocal) and allocates PgStat_Snapshot.PgStat_IO only when quering those views. So with all 3 above combined we have back: 0000000011573376 0000000000002888 B PendingIOStats 0000000011570304 0000000000002880 b PendingBackendStats 0000000011569184 0000000000000984 B pgStatLocal That's actual saving over master itself: 0000000011577344 0000000000052824 B pgStatLocal 0000000011633408 0000000000002880 b PendingIOStats 0000000011630304 0000000000002880 b PendingBackendStats > This also increases the shared-memory usage of pgstats: Before it used ~300kB > on a small system. That nearly doubles with this patch. But that's perhaps > less concerning, given it's per-system, rather than per-backend memory usage. v7-0005: Skipping 4 backend types out of of 17 makes it ignoring ~23% of backend types and with simple array , I can get this down from ~592384 down to ~519424 _total_ memory allocated for'Shared Memory Stats' shm (this one was sent earlier). v7-0006: We could reduce total pgstats shm down to ~482944b if we would eliminate tracking of two further IMHO useless types: autovacuum_launcher and standalone_backend. Master is @ 315904 (so that's just 163kb more according to pg_shm_allocations). Patches probably need some squash and pgident, etc. -J.
From 41510e5b8da6e8c84b01249fe227f57927941f9c Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Fri, 23 Jan 2026 08:10:09 +0100 Subject: [PATCH v7 1/6] Add pg_stat_io_histogram view to provide more detailed insight into IO profile pg_stat_io_histogram displays a histogram of IO latencies for specific backend_type, object, context and io_type. The histogram has buckets that allow faster identification of I/O latency outliers due to faulty hardware and/or misbehaving I/O stack. Such I/O outliers e.g. slow fsyncs could sometimes cause intermittent issues e.g. for COMMIT or affect the synchronous standbys performance. Author: Jakub Wartak <[email protected]> Reviewed-by: Andres Freund <[email protected]> Reviewed-by: Ants Aasma <[email protected]> Discussion: https://postgr.es/m/CAKZiRmwvE4uJLKTgPXeBA4m%2Bd4tTghayoefcaM9%3Dz3_S7i72GA%40mail.gmail.com --- configure | 38 ++++ configure.ac | 1 + doc/src/sgml/config.sgml | 12 +- doc/src/sgml/monitoring.sgml | 293 ++++++++++++++++++++++++- doc/src/sgml/wal.sgml | 5 +- meson.build | 1 + src/backend/catalog/system_views.sql | 11 + src/backend/utils/activity/pgstat_io.c | 63 ++++++ src/backend/utils/adt/pgstatfuncs.c | 145 ++++++++++++ src/include/catalog/pg_proc.dat | 9 + src/include/pgstat.h | 14 ++ src/include/port/pg_bitutils.h | 31 ++- src/test/regress/expected/rules.out | 8 + src/test/regress/expected/stats.out | 23 ++ src/test/regress/sql/stats.sql | 15 ++ src/tools/pgindent/typedefs.list | 1 + 16 files changed, 662 insertions(+), 8 deletions(-) diff --git a/configure b/configure index 4aaaf92ba0a..a78ca8b99d9 100755 --- a/configure +++ b/configure @@ -15931,6 +15931,44 @@ cat >>confdefs.h <<_ACEOF #define HAVE__BUILTIN_CLZ 1 _ACEOF +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_clzl" >&5 +$as_echo_n "checking for __builtin_clzl... " >&6; } +if ${pgac_cv__builtin_clzl+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +call__builtin_clzl(unsigned long x) +{ + return __builtin_clzl(x); +} +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + pgac_cv__builtin_clzl=yes +else + pgac_cv__builtin_clzl=no +fi +rm -f core conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_clzl" >&5 +$as_echo "$pgac_cv__builtin_clzl" >&6; } +if test x"${pgac_cv__builtin_clzl}" = xyes ; then + +cat >>confdefs.h <<_ACEOF +#define HAVE__BUILTIN_CLZL 1 +_ACEOF + fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_ctz" >&5 $as_echo_n "checking for __builtin_ctz... " >&6; } diff --git a/configure.ac b/configure.ac index 9bc457bac87..fdde65205e2 100644 --- a/configure.ac +++ b/configure.ac @@ -1868,6 +1868,7 @@ PGAC_CHECK_BUILTIN_FUNC([__builtin_bswap32], [int x]) PGAC_CHECK_BUILTIN_FUNC([__builtin_bswap64], [long int x]) # We assume that we needn't test all widths of these explicitly: PGAC_CHECK_BUILTIN_FUNC([__builtin_clz], [unsigned int x]) +PGAC_CHECK_BUILTIN_FUNC([__builtin_clzl], [unsigned long x]) PGAC_CHECK_BUILTIN_FUNC([__builtin_ctz], [unsigned int x]) # __builtin_frame_address may draw a diagnostic for non-constant argument, # so it needs a different test function. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8cdd826fbd3..c06c0874fce 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8840,9 +8840,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; displayed in <link linkend="monitoring-pg-stat-database-view"> <structname>pg_stat_database</structname></link>, <link linkend="monitoring-pg-stat-io-view"> - <structname>pg_stat_io</structname></link> (if <varname>object</varname> - is not <literal>wal</literal>), in the output of the - <link linkend="pg-stat-get-backend-io"> + <structname>pg_stat_io</structname></link> and + <link linkend="monitoring-pg-stat-io-histogram-view"> + <structname>pg_stat_io_histogram</structname></link> + (if <varname>object</varname> is not <literal>wal</literal>), + in the output of the <link linkend="pg-stat-get-backend-io"> <function>pg_stat_get_backend_io()</function></link> function (if <varname>object</varname> is not <literal>wal</literal>), in the output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> @@ -8872,7 +8874,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; measure the overhead of timing on your system. I/O timing information is displayed in <link linkend="monitoring-pg-stat-io-view"> - <structname>pg_stat_io</structname></link> for the + <structname>pg_stat_io</structname></link> and + <link linkend="monitoring-pg-stat-io-histogram-view"> + <structname>pg_stat_io_histogram</structname></link> for the <varname>object</varname> <literal>wal</literal> and in the output of the <link linkend="pg-stat-get-backend-io"> <function>pg_stat_get_backend_io()</function></link> function for the diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b3d53550688..4e2d8251c08 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -501,6 +501,17 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </entry> </row> + <row> + <entry><structname>pg_stat_io_histogram</structname><indexterm><primary>pg_stat_io_histogram</primary></indexterm></entry> + <entry> + One row for each combination of backend type, context, target object, + IO operation type and latency bucket (in microseconds) containing + cluster-wide I/O statistics. + See <link linkend="monitoring-pg-stat-io-histogram-view"> + <structname>pg_stat_io_histogram</structname></link> for details. + </entry> + </row> + <row> <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry> <entry>One row per replication slot, showing statistics about the @@ -698,7 +709,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <para> The <structname>pg_stat_io</structname> and - <structname>pg_statio_</structname> set of views are useful for determining + <structname>pg_stat_io_histogram</structname> set of views are useful for determining the effectiveness of the buffer cache. They can be used to calculate a cache hit ratio. Note that while <productname>PostgreSQL</productname>'s I/O statistics capture most instances in which the kernel was invoked in order @@ -707,6 +718,8 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser Users are advised to use the <productname>PostgreSQL</productname> statistics views in combination with operating system utilities for a more complete picture of their database's I/O performance. + Furthermore the <structname>pg_stat_io_histogram</structname> view can be helpful + identifying latency outliers for specific I/O operations. </para> </sect2> @@ -3275,6 +3288,284 @@ description | Waiting for a newly initialized WAL file to reach durable storage </sect2> + <sect2 id="monitoring-pg-stat-io-histogram-view"> + <title><structname>pg_stat_io_histogram</structname></title> + + <indexterm> + <primary>pg_stat_io_histogram</primary> + </indexterm> + + <para> + The <structname>pg_stat_io_histogram</structname> view will contain one row for each + combination of backend type, target I/O object, and I/O context, IO operation + type, bucket latency cluster-wide I/O statistics. Combinations which do not make sense + are omitted. + </para> + + <para> + The view shows measured perceived I/O latency by the backend, not the kernel or device + one. This is important distinction when troubleshooting, as the I/O latency observed by + the backend might get affected by: + <itemizedlist> + <listitem> + <para>OS scheduler decisions and available CPU resources.</para> + <para>With AIO, it might include time to service other IOs from the queue. That will often inflate IO latency.</para> + <para>In case of writing, additional filesystem journaling operations.</para> + </listitem> + </itemizedlist> + </para> + + <para> + Currently, I/O on relations (e.g. tables, indexes) and WAL activity are + tracked. However, relation I/O which bypasses shared buffers + (e.g. when moving a table from one tablespace to another) is currently + not tracked. + </para> + + <table id="pg-stat-io-histogram-view" xreflabel="pg_stat_io_histogram"> + <title><structname>pg_stat_io_histogram</structname> View</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + Column Type + </para> + <para> + Description + </para> + </entry> + </row> + </thead> + <tbody> + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>backend_type</structfield> <type>text</type> + </para> + <para> + Type of backend (e.g. background worker, autovacuum worker). See <link + linkend="monitoring-pg-stat-activity-view"> + <structname>pg_stat_activity</structname></link> for more information + on <varname>backend_type</varname>s. Some + <varname>backend_type</varname>s do not accumulate I/O operation + statistics and will not be included in the view. + </para> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>object</structfield> <type>text</type> + </para> + <para> + Target object of an I/O operation. Possible values are: + <itemizedlist> + <listitem> + <para> + <literal>relation</literal>: Permanent relations. + </para> + </listitem> + <listitem> + <para> + <literal>temp relation</literal>: Temporary relations. + </para> + </listitem> + <listitem> + <para> + <literal>wal</literal>: Write Ahead Logs. + </para> + </listitem> + </itemizedlist> + </para> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>context</structfield> <type>text</type> + </para> + <para> + The context of an I/O operation. Possible values are: + </para> + <itemizedlist> + <listitem> + <para> + <literal>normal</literal>: The default or standard + <varname>context</varname> for a type of I/O operation. For + example, by default, relation data is read into and written out from + shared buffers. Thus, reads and writes of relation data to and from + shared buffers are tracked in <varname>context</varname> + <literal>normal</literal>. + </para> + </listitem> + <listitem> + <para> + <literal>init</literal>: I/O operations performed while creating the + WAL segments are tracked in <varname>context</varname> + <literal>init</literal>. + </para> + </listitem> + <listitem> + <para> + <literal>vacuum</literal>: I/O operations performed outside of shared + buffers while vacuuming and analyzing permanent relations. Temporary + table vacuums use the same local buffer pool as other temporary table + I/O operations and are tracked in <varname>context</varname> + <literal>normal</literal>. + </para> + </listitem> + <listitem> + <para> + <literal>bulkread</literal>: Certain large read I/O operations + done outside of shared buffers, for example, a sequential scan of a + large table. + </para> + </listitem> + <listitem> + <para> + <literal>bulkwrite</literal>: Certain large write I/O operations + done outside of shared buffers, such as <command>COPY</command>. + </para> + </listitem> + </itemizedlist> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>io_type</structfield> <type>text</type> + </para> + <para> + The type of I/O operation. Possible values are: + </para> + <itemizedlist> + <listitem> + <para> + <literal>evict</literal>: eviction from shared buffers cache. + </para> + </listitem> + <listitem> + <para> + <literal>fsync</literal>: synchronization of modified kernel's + filesystem page cache with storage device. + </para> + </listitem> + <listitem> + <para> + <literal>hit</literal>: shared buffers cache lookup hit. + </para> + </listitem> + <listitem> + <para> + <literal>reuse</literal>: reuse of existing buffer in case of + reusing limited-space ring buffer (applies to <literal>bulkread</literal>, + <literal>bulkwrite</literal>, or <literal>vacuum</literal> contexts). + </para> + </listitem> + <listitem> + <para> + <literal>writeback</literal>: advise kernel that the described dirty + data should be flushed to disk preferably asynchronously. + </para> + </listitem> + <listitem> + <para> + <literal>extend</literal>: add new zeroed blocks to the end of file. + </para> + </listitem> + <listitem> + <para> + <literal>read</literal>: self explanatory. + </para> + </listitem> + <listitem> + <para> + <literal>write</literal>: self explanatory. + </para> + </listitem> + </itemizedlist> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>bucket_latency_us</structfield> <type>int4range</type> + </para> + <para> + The latency bucket (in microseconds). + </para> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>bucket_count</structfield> <type>bigint</type> + </para> + <para> + Number of times latency of the I/O operation hit this specific bucket (with + up to <varname>bucket_latency_us</varname> microseconds). + </para> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>stats_reset</structfield> <type>timestamp with time zone</type> + </para> + <para> + Time at which these statistics were last reset. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Some backend types never perform I/O operations on some I/O objects and/or + in some I/O contexts. These rows might display zero bucket counts for such + specific operations. + </para> + + <para> + <structname>pg_stat_io_histogram</structname> can be used to identify + I/O storage issues + For example: + <itemizedlist> + <listitem> + <para> + Presence of abnormally high latency for <varname>fsyncs</varname> might + indicate I/O saturation, oversubscription or hardware connectivity issues. + </para> + </listitem> + <listitem> + <para> + Unusually high latency for <varname>fsyncs</varname> on standby's startup + backend type, might be responsible for high duration of commits in + synchronous replication setups. + </para> + </listitem> + </itemizedlist> + </para> + + <note> + <para> + Columns tracking I/O wait time will only be non-zero when + <xref linkend="guc-track-io-timing"/> is enabled. The user should be + careful when referencing these columns in combination with their + corresponding I/O operations in case <varname>track_io_timing</varname> + was not enabled for the entire time since the last stats reset. + </para> + </note> + </sect2> + <sect2 id="monitoring-pg-stat-bgwriter-view"> <title><structname>pg_stat_bgwriter</structname></title> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index f3b86b26be9..8b8c407e69f 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -832,8 +832,9 @@ of times <function>XLogWrite</function> writes and <function>issue_xlog_fsync</function> syncs WAL data to disk are also counted as <varname>writes</varname> and <varname>fsyncs</varname> - in <structname>pg_stat_io</structname> for the <varname>object</varname> - <literal>wal</literal>, respectively. + in <structname>pg_stat_io</structname> and + <structname>pg_stat_io_histogram</structname> for the + <varname>object</varname> <literal>wal</literal>, respectively. </para> <para> diff --git a/meson.build b/meson.build index 2df54409ca6..00575624688 100644 --- a/meson.build +++ b/meson.build @@ -2045,6 +2045,7 @@ builtins = [ 'bswap32', 'bswap64', 'clz', + 'clzl', 'ctz', 'constant_p', 'frame_address', diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 2eda7d80d02..55c3ec4eaec 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1247,6 +1247,17 @@ SELECT b.stats_reset FROM pg_stat_get_io() b; +CREATE VIEW pg_stat_io_histogram AS +SELECT + b.backend_type, + b.object, + b.context, + b.io_type, + b.bucket_latency_us, + b.bucket_count, + b.stats_reset +FROM pg_stat_get_io_histogram() b; + CREATE VIEW pg_stat_wal AS SELECT w.wal_records, diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index 28de24538dc..148a2a9c7d5 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -17,6 +17,7 @@ #include "postgres.h" #include "executor/instrument.h" +#include "port/pg_bitutils.h" #include "storage/bufmgr.h" #include "utils/pgstat_internal.h" @@ -107,6 +108,32 @@ pgstat_prepare_io_time(bool track_io_guc) return io_start; } +#define MIN_PG_STAT_IO_HIST_LATENCY 8191 +static inline int get_bucket_index(uint64_t ns) { + const uint32_t max_index = PGSTAT_IO_HIST_BUCKETS - 1; + /* + * hopefully pre-calculated by the compiler: + * clzl(8191) = clz(01111111111111b on uint64) + */ + const uint32_t min_latency_leading_zeros = + pg_leading_zero_bits64(MIN_PG_STAT_IO_HIST_LATENCY); + + /* + * make sure the tmp value has at least 8191 (our minimum bucket size) + * as __builtin_clzl might return undefined behavior when operating on 0 + */ + uint64_t tmp = ns | MIN_PG_STAT_IO_HIST_LATENCY; + + /* count leading zeros */ + int leading_zeros = pg_leading_zero_bits64(tmp); + + /* normalize the index */ + uint32_t index = min_latency_leading_zeros - leading_zeros; + + /* clamp it to the maximum */ + return (index > max_index) ? max_index : index; +} + /* * Like pgstat_count_io_op() except it also accumulates time. * @@ -125,6 +152,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, if (!INSTR_TIME_IS_ZERO(start_time)) { instr_time io_time; + int bucket_index; INSTR_TIME_SET_CURRENT(io_time); INSTR_TIME_SUBTRACT(io_time, start_time); @@ -152,6 +180,10 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op], io_time); + /* calculate the bucket_index based on latency in nanoseconds (uint64) */ + bucket_index = get_bucket_index(INSTR_TIME_GET_NANOSEC(io_time)); + PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][bucket_index]++; + /* Add the per-backend count */ pgstat_count_backend_io_op_time(io_object, io_context, io_op, io_time); @@ -221,6 +253,10 @@ pgstat_io_flush_cb(bool nowait) bktype_shstats->times[io_object][io_context][io_op] += INSTR_TIME_GET_MICROSEC(time); + + for(int b = 0; b < PGSTAT_IO_HIST_BUCKETS; b++) + bktype_shstats->hist_time_buckets[io_object][io_context][io_op][b] += + PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][b]; } } } @@ -274,6 +310,33 @@ pgstat_get_io_object_name(IOObject io_object) pg_unreachable(); } +const char * +pgstat_get_io_op_name(IOOp io_op) +{ + switch (io_op) + { + case IOOP_EVICT: + return "evict"; + case IOOP_FSYNC: + return "fsync"; + case IOOP_HIT: + return "hit"; + case IOOP_REUSE: + return "reuse"; + case IOOP_WRITEBACK: + return "writeback"; + case IOOP_EXTEND: + return "extend"; + case IOOP_READ: + return "read"; + case IOOP_WRITE: + return "write"; + } + + elog(ERROR, "unrecognized IOOp value: %d", io_op); + pg_unreachable(); +} + void pgstat_io_init_shmem_cb(void *stats) { diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index b1df96e7b0b..ac08ab14195 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -18,6 +18,7 @@ #include "access/xlog.h" #include "access/xlogprefetcher.h" #include "catalog/catalog.h" +#include "catalog/namespace.h" #include "catalog/pg_authid.h" #include "catalog/pg_type.h" #include "common/ip.h" @@ -30,6 +31,7 @@ #include "storage/procarray.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/rangetypes.h" #include "utils/timestamp.h" #define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var)))) @@ -1639,6 +1641,149 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS) return (Datum) 0; } +/* +* When adding a new column to the pg_stat_io_histogram view and the +* pg_stat_get_io_histogram() function, add a new enum value here above +* HIST_IO_NUM_COLUMNS. +*/ +typedef enum hist_io_stat_col +{ + HIST_IO_COL_INVALID = -1, + HIST_IO_COL_BACKEND_TYPE, + HIST_IO_COL_OBJECT, + HIST_IO_COL_CONTEXT, + HIST_IO_COL_IOTYPE, + HIST_IO_COL_BUCKET_US, + HIST_IO_COL_COUNT, + HIST_IO_COL_RESET_TIME, + HIST_IO_NUM_COLUMNS +} histogram_io_stat_col; + +/* + * pg_stat_io_histogram_build_tuples + * + * Helper routine for pg_stat_get_io_histogram() and pg_stat_get_backend_io() + * filling a result tuplestore with one tuple for each object and each + * context supported by the caller, based on the contents of bktype_stats. + */ +static void +pg_stat_io_histogram_build_tuples(ReturnSetInfo *rsinfo, + PgStat_BktypeIO *bktype_stats, + BackendType bktype, + TimestampTz stat_reset_timestamp) +{ + /* Get OID for int4range type */ + Datum bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype)); + Oid range_typid = TypenameGetTypid("int4range"); + TypeCacheEntry *typcache = lookup_type_cache(range_typid, TYPECACHE_RANGE_INFO); + + for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++) + { + const char *obj_name = pgstat_get_io_object_name(io_obj); + + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++) + { + const char *context_name = pgstat_get_io_context_name(io_context); + + /* + * Some combinations of BackendType, IOObject, and IOContext are + * not valid for any type of IOOp. In such cases, omit the entire + * row from the view. + */ + if (!pgstat_tracks_io_object(bktype, io_obj, io_context)) + continue; + + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++) + { + const char *op_name = pgstat_get_io_op_name(io_op); + + for(int bucket = 0; bucket < PGSTAT_IO_HIST_BUCKETS; bucket++) { + Datum values[HIST_IO_NUM_COLUMNS] = {0}; + bool nulls[HIST_IO_NUM_COLUMNS] = {0}; + RangeBound lower, upper; + RangeType *range; + + values[HIST_IO_COL_BACKEND_TYPE] = bktype_desc; + values[HIST_IO_COL_OBJECT] = CStringGetTextDatum(obj_name); + values[HIST_IO_COL_CONTEXT] = CStringGetTextDatum(context_name); + values[HIST_IO_COL_IOTYPE] = CStringGetTextDatum(op_name); + + /* bucket's maximum latency as range in microseconds */ + if(bucket == 0) + lower.val = Int32GetDatum(0); + else + lower.val = Int32GetDatum(1 << (2 + bucket)); + lower.infinite = false; + lower.inclusive = true; + lower.lower = true; + + if(bucket == PGSTAT_IO_HIST_BUCKETS - 1) + upper.infinite = true; + else { + upper.val = Int32GetDatum(1 << (2 + bucket + 1)); + upper.infinite = false; + } + upper.inclusive = true; + upper.lower = false; + + range = make_range(typcache, &lower, &upper, false, NULL); + values[HIST_IO_COL_BUCKET_US] = RangeTypePGetDatum(range); + + /* bucket count */ + values[HIST_IO_COL_COUNT] = Int64GetDatum( + bktype_stats->hist_time_buckets[io_obj][io_context][io_op][bucket]); + + if (stat_reset_timestamp != 0) + values[HIST_IO_COL_RESET_TIME] = TimestampTzGetDatum(stat_reset_timestamp); + else + nulls[HIST_IO_COL_RESET_TIME] = true; + + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, + values, nulls); + } + } + } + } +} + +Datum +pg_stat_get_io_histogram(PG_FUNCTION_ARGS) +{ + ReturnSetInfo *rsinfo; + PgStat_IO *backends_io_stats; + + InitMaterializedSRF(fcinfo, 0); + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + + backends_io_stats = pgstat_fetch_stat_io(); + + for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++) + { + PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype]; + + /* + * In Assert builds, we can afford an extra loop through all of the + * counters (in pg_stat_io_build_tuples()), checking that only + * expected stats are non-zero, since it keeps the non-Assert code + * cleaner. + */ + Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype)); + + /* + * For those BackendTypes without IO Operation stats, skip + * representing them in the view altogether. + */ + if (!pgstat_tracks_io_bktype(bktype)) + continue; + + /* save tuples with data from this PgStat_BktypeIO */ + pg_stat_io_histogram_build_tuples(rsinfo, bktype_stats, bktype, + backends_io_stats->stat_reset_timestamp); + } + + return (Datum) 0; +} + /* * pg_stat_wal_build_tuple * diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 361e2cfffeb..3ba04f9e11f 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6038,6 +6038,15 @@ proargnames => '{backend_type,object,context,reads,read_bytes,read_time,writes,write_bytes,write_time,writebacks,writeback_time,extends,extend_bytes,extend_time,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', prosrc => 'pg_stat_get_io' }, +{ oid => '6149', descr => 'statistics: per backend type IO latency histogram', + proname => 'pg_stat_get_io_histogram', prorows => '30', proretset => 't', + provolatile => 'v', proparallel => 'r', prorettype => 'record', + proargtypes => '', + proallargtypes => '{text,text,text,text,int4range,int8,timestamptz}', + proargmodes => '{o,o,o,o,o,o,o}', + proargnames => '{backend_type,object,context,io_type,bucket_latency_us,bucket_count,stats_reset}', + prosrc => 'pg_stat_get_io_histogram' }, + { oid => '6386', descr => 'statistics: backend IO statistics', proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't', provolatile => 'v', proparallel => 'r', prorettype => 'record', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 0e9d2b4c623..8e06f3e05d2 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -326,11 +326,23 @@ typedef enum IOOp (((unsigned int) (io_op)) < IOOP_NUM_TYPES && \ ((unsigned int) (io_op)) >= IOOP_EXTEND) +/* + * This should represent balance between being fast and providing value + * to the users: + * 1. We want to cover various fast and slow device types (0.01ms - 15ms) + * 2. We want to also cover sporadic long tail latencies (hardware issues, + * delayed fsyncs, stuck I/O) + * 3. We want to be as small as possible here in terms of size: + * 16 * sizeof(uint64) = which should be less than two cachelines. + */ +#define PGSTAT_IO_HIST_BUCKETS 16 + typedef struct PgStat_BktypeIO { uint64 bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; + uint64 hist_time_buckets[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES][PGSTAT_IO_HIST_BUCKETS]; } PgStat_BktypeIO; typedef struct PgStat_PendingIO @@ -338,6 +350,7 @@ typedef struct PgStat_PendingIO uint64 bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; instr_time pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; + uint64 pending_hist_time_buckets[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES][PGSTAT_IO_HIST_BUCKETS]; } PgStat_PendingIO; typedef struct PgStat_IO @@ -610,6 +623,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context, extern PgStat_IO *pgstat_fetch_stat_io(void); extern const char *pgstat_get_io_context_name(IOContext io_context); extern const char *pgstat_get_io_object_name(IOObject io_object); +extern const char *pgstat_get_io_op_name(IOOp io_op); extern bool pgstat_tracks_io_bktype(BackendType bktype); extern bool pgstat_tracks_io_object(BackendType bktype, diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h index 0bca559caaa..4e94257682b 100644 --- a/src/include/port/pg_bitutils.h +++ b/src/include/port/pg_bitutils.h @@ -32,6 +32,35 @@ extern PGDLLIMPORT const uint8 pg_leftmost_one_pos[256]; extern PGDLLIMPORT const uint8 pg_rightmost_one_pos[256]; extern PGDLLIMPORT const uint8 pg_number_of_ones[256]; + +/* + * pg_leading_zero_bits64 + * Returns the number of leading 0-bits in x, starting at the most significant bit position. + * Word must not be 0 (as it is undefined behavior). + */ +static inline int +pg_leading_zero_bits64(uint64 word) +{ +#ifdef HAVE__BUILTIN_CLZL + Assert(word != 0); + + return __builtin_clzl(word); +#else + int n = 64; + uint64 y; + if (word == 0) + return 64; + + y = word >> 32; if (y != 0) { n -= 32; word = y; } + y = word >> 16; if (y != 0) { n -= 16; word = y; } + y = word >> 8; if (y != 0) { n -= 8; word = y; } + y = word >> 4; if (y != 0) { n -= 4; word = y; } + y = word >> 2; if (y != 0) { n -= 2; word = y; } + y = word >> 1; if (y != 0) { return n - 2; } + return n - 1; +#endif +} + /* * pg_leftmost_one_pos32 * Returns the position of the most significant set bit in "word", @@ -71,7 +100,7 @@ pg_leftmost_one_pos32(uint32 word) static inline int pg_leftmost_one_pos64(uint64 word) { -#ifdef HAVE__BUILTIN_CLZ +#ifdef HAVE__BUILTIN_CLZL Assert(word != 0); #if SIZEOF_LONG == 8 diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index deb6e2ad6a9..e3be836a461 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1951,6 +1951,14 @@ pg_stat_io| SELECT backend_type, fsync_time, stats_reset FROM pg_stat_get_io() b(backend_type, object, context, reads, read_bytes, read_time, writes, write_bytes, write_time, writebacks, writeback_time, extends, extend_bytes, extend_time, hits, evictions, reuses, fsyncs, fsync_time, stats_reset); +pg_stat_io_histogram| SELECT backend_type, + object, + context, + io_type, + bucket_latency_us, + bucket_count, + stats_reset + FROM pg_stat_get_io_histogram() b(backend_type, object, context, io_type, bucket_latency_us, bucket_count, stats_reset); pg_stat_progress_analyze| SELECT s.pid, s.datid, d.datname, diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index cd00f35bf7a..4c95f09d651 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -1765,6 +1765,29 @@ SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset; t (1 row) +-- Check that pg_stat_io_histograms sees some growing counts in buckets +-- We could also try with checkpointer, but it often runs with fsync=off +-- during test. +SET track_io_timing TO 'on'; +SELECT sum(bucket_count) AS hist_bucket_count_sum FROM pg_stat_get_io_histogram() +WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset +CREATE TABLE test_io_hist(id bigint); +INSERT INTO test_io_hist SELECT generate_series(1, 100) s; +SELECT pg_stat_force_next_flush(); + pg_stat_force_next_flush +-------------------------- + +(1 row) + +SELECT sum(bucket_count) AS hist_bucket_count_sum2 FROM pg_stat_get_io_histogram() +WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset +SELECT :hist_bucket_count_sum2 > :hist_bucket_count_sum; + ?column? +---------- + t +(1 row) + +RESET track_io_timing; -- Check invalid input for pg_stat_get_backend_io() SELECT pg_stat_get_backend_io(NULL); pg_stat_get_backend_io diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index 8768e0f27fd..063b1011d7e 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -841,6 +841,21 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset; + +-- Check that pg_stat_io_histograms sees some growing counts in buckets +-- We could also try with checkpointer, but it often runs with fsync=off +-- during test. +SET track_io_timing TO 'on'; +SELECT sum(bucket_count) AS hist_bucket_count_sum FROM pg_stat_get_io_histogram() +WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset +CREATE TABLE test_io_hist(id bigint); +INSERT INTO test_io_hist SELECT generate_series(1, 100) s; +SELECT pg_stat_force_next_flush(); +SELECT sum(bucket_count) AS hist_bucket_count_sum2 FROM pg_stat_get_io_histogram() +WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset +SELECT :hist_bucket_count_sum2 > :hist_bucket_count_sum; +RESET track_io_timing; + -- Check invalid input for pg_stat_get_backend_io() SELECT pg_stat_get_backend_io(NULL); SELECT pg_stat_get_backend_io(0); diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 77e3c04144e..15c16db8793 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -3758,6 +3758,7 @@ gtrgm_consistent_cache gzFile heap_page_items_state help_handler +histogram_io_stat_col hlCheck hstoreCheckKeyLen_t hstoreCheckValLen_t -- 2.43.0
From fab13516302da9ddcb3fb7b7ec5699182c2a9ce6 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Fri, 6 Mar 2026 08:25:54 +0100 Subject: [PATCH v7 2/6] PendingBackendStats save memory --- src/backend/utils/activity/pgstat_backend.c | 4 ++-- src/include/pgstat.h | 16 ++++++++++++---- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c index f2f8d3ff75f..4cd3fb923c9 100644 --- a/src/backend/utils/activity/pgstat_backend.c +++ b/src/backend/utils/activity/pgstat_backend.c @@ -167,7 +167,7 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref) { PgStatShared_Backend *shbackendent; PgStat_BktypeIO *bktype_shstats; - PgStat_PendingIO pending_io; + PgStat_BackendPendingIO pending_io; /* * This function can be called even if nothing at all has happened for IO @@ -204,7 +204,7 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref) /* * Clear out the statistics buffer, so it can be re-used. */ - MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO)); + MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_BackendPendingIO)); backend_has_iostats = false; } diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 8e06f3e05d2..9554de3a803 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -521,15 +521,23 @@ typedef struct PgStat_Backend } PgStat_Backend; /* --------- - * PgStat_BackendPending Non-flushed backend stats. + * PgStat_BackendPending(IO) Non-flushed backend stats. * --------- */ +typedef struct PgStat_BackendPendingIO { + uint64 bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; + PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; + instr_time pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; +} PgStat_BackendPendingIO; + typedef struct PgStat_BackendPending { /* - * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO. - */ - PgStat_PendingIO pending_io; + * Backend statistics store almost the same amount of IO data as + * PGSTAT_KIND_IO. The only difference between PgStat_BackendPendingIO + * and PgStat_PendingIO is that the latter also track IO latency histograms. + */ + PgStat_BackendPendingIO pending_io; } PgStat_BackendPending; /* -- 2.43.0
From bcb30aadc7a62f73d0958e2dc84aa285b2357ab8 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Fri, 6 Mar 2026 12:09:10 +0100 Subject: [PATCH v7 4/6] Convert PgStat_IO to pointer to avoid huge static memory allocation if not used. --- src/backend/utils/activity/pgstat.c | 9 ++++++++- src/backend/utils/activity/pgstat_io.c | 14 +++++++++++--- src/include/utils/pgstat_internal.h | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-) diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index f015f217766..d61c50a4aef 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -1644,10 +1644,17 @@ pgstat_write_statsfile(void) pgstat_build_snapshot_fixed(kind); if (pgstat_is_kind_builtin(kind)) - ptr = ((char *) &pgStatLocal.snapshot) + info->snapshot_ctl_off; + { + if(kind == PGSTAT_KIND_IO) + ptr = (char *) pgStatLocal.snapshot.io; + else + ptr = ((char *) &pgStatLocal.snapshot) + info->snapshot_ctl_off; + } else ptr = pgStatLocal.snapshot.custom_data[kind - PGSTAT_KIND_CUSTOM_MIN]; + Assert(ptr != NULL); + fputc(PGSTAT_FILE_ENTRY_FIXED, fpout); pgstat_write_chunk_s(fpout, &kind); pgstat_write_chunk(fpout, ptr, info->shared_data_len); diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index ae689d3926e..8605ea65605 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -19,6 +19,7 @@ #include "executor/instrument.h" #include "port/pg_bitutils.h" #include "storage/bufmgr.h" +#include "utils/memutils.h" #include "utils/pgstat_internal.h" PgStat_PendingIO PendingIOStats; @@ -199,7 +200,7 @@ pgstat_fetch_stat_io(void) { pgstat_snapshot_fixed(PGSTAT_KIND_IO); - return &pgStatLocal.snapshot.io; + return pgStatLocal.snapshot.io; } /* @@ -348,6 +349,9 @@ pgstat_io_init_shmem_cb(void *stats) for (int i = 0; i < BACKEND_NUM_TYPES; i++) LWLockInitialize(&stat_shmem->locks[i], LWTRANCHE_PGSTATS_DATA); + + /* this might end up being lazily allocated in pgstat_io_snapshot_cb() */ + pgStatLocal.snapshot.io = NULL; } void @@ -375,11 +379,15 @@ pgstat_io_reset_all_cb(TimestampTz ts) void pgstat_io_snapshot_cb(void) { + if (unlikely(pgStatLocal.snapshot.io == NULL)) + pgStatLocal.snapshot.io = MemoryContextAllocZero(TopMemoryContext, + sizeof(PgStat_IO)); + for (int i = 0; i < BACKEND_NUM_TYPES; i++) { LWLock *bktype_lock = &pgStatLocal.shmem->io.locks[i]; PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i]; - PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i]; + PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io->stats[i]; LWLockAcquire(bktype_lock, LW_SHARED); @@ -388,7 +396,7 @@ pgstat_io_snapshot_cb(void) * the reset timestamp as well. */ if (i == 0) - pgStatLocal.snapshot.io.stat_reset_timestamp = + pgStatLocal.snapshot.io->stat_reset_timestamp = pgStatLocal.shmem->io.stats.stat_reset_timestamp; /* using struct assignment due to better type safety */ diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 9b8fbae00ed..407657e060c 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -600,7 +600,7 @@ typedef struct PgStat_Snapshot PgStat_CheckpointerStats checkpointer; - PgStat_IO io; + PgStat_IO *io; PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS]; -- 2.43.0
From 067d7a08972ec8728212af63f9a4a852c6fe0345 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Fri, 6 Mar 2026 11:26:19 +0100 Subject: [PATCH v7 3/6] PendingIOStats save memory --- src/backend/utils/activity/pgstat.c | 10 ++++++++ src/backend/utils/activity/pgstat_io.c | 20 +++++++++------- src/include/pgstat.h | 8 ++++++- src/test/recovery/t/029_stats_restart.pl | 29 ++++++++++++++++++++++++ src/test/regress/expected/stats.out | 23 ------------------- src/test/regress/sql/stats.sql | 15 ------------ 6 files changed, 58 insertions(+), 47 deletions(-) diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 11bb71cad5a..f015f217766 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -104,8 +104,10 @@ #include <unistd.h> #include "access/xact.h" +#include "access/xlog.h" #include "lib/dshash.h" #include "pgstat.h" +#include "storage/bufmgr.h" #include "storage/fd.h" #include "storage/ipc.h" #include "storage/lwlock.h" @@ -671,6 +673,14 @@ pgstat_initialize(void) /* Set up a process-exit hook to clean up */ before_shmem_exit(pgstat_shutdown_hook, 0); + /* Allocate I/O latency buckets only if we are going to populate it */ + if (track_io_timing || track_wal_io_timing) + PendingIOStats.pending_hist_time_buckets = MemoryContextAllocZero(TopMemoryContext, + IOOBJECT_NUM_TYPES * IOCONTEXT_NUM_TYPES * IOOP_NUM_TYPES * + PGSTAT_IO_HIST_BUCKETS * sizeof(uint64)); + else + PendingIOStats.pending_hist_time_buckets = NULL; + #ifdef USE_ASSERT_CHECKING pgstat_is_initialized = true; #endif diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index 148a2a9c7d5..ae689d3926e 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -21,7 +21,7 @@ #include "storage/bufmgr.h" #include "utils/pgstat_internal.h" -static PgStat_PendingIO PendingIOStats; +PgStat_PendingIO PendingIOStats; static bool have_iostats = false; /* @@ -180,9 +180,11 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op], io_time); - /* calculate the bucket_index based on latency in nanoseconds (uint64) */ - bucket_index = get_bucket_index(INSTR_TIME_GET_NANOSEC(io_time)); - PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][bucket_index]++; + if(PendingIOStats.pending_hist_time_buckets != NULL) { + /* calculate the bucket_index based on latency in nanoseconds (uint64) */ + bucket_index = get_bucket_index(INSTR_TIME_GET_NANOSEC(io_time)); + PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][bucket_index]++; + } /* Add the per-backend count */ pgstat_count_backend_io_op_time(io_object, io_context, io_op, @@ -254,9 +256,10 @@ pgstat_io_flush_cb(bool nowait) bktype_shstats->times[io_object][io_context][io_op] += INSTR_TIME_GET_MICROSEC(time); - for(int b = 0; b < PGSTAT_IO_HIST_BUCKETS; b++) - bktype_shstats->hist_time_buckets[io_object][io_context][io_op][b] += - PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][b]; + if(PendingIOStats.pending_hist_time_buckets != NULL) + for(int b = 0; b < PGSTAT_IO_HIST_BUCKETS; b++) + bktype_shstats->hist_time_buckets[io_object][io_context][io_op][b] += + PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][b]; } } } @@ -265,7 +268,8 @@ pgstat_io_flush_cb(bool nowait) LWLockRelease(bktype_lock); - memset(&PendingIOStats, 0, sizeof(PendingIOStats)); + /* Avoid overwriting latency buckets array pointer */ + memset(&PendingIOStats, 0, offsetof(PgStat_PendingIO, pending_hist_time_buckets)); have_iostats = false; diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 9554de3a803..59114f1bc3f 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -350,9 +350,15 @@ typedef struct PgStat_PendingIO uint64 bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; instr_time pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; - uint64 pending_hist_time_buckets[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES][PGSTAT_IO_HIST_BUCKETS]; + /* + * Dynamically allocated array of [IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES] + * [IOOP_NUM_TYPES][PGSTAT_IO_HIST_BUCKETS] only with track_io_timings true. + */ + uint64 (*pending_hist_time_buckets)[IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES][PGSTAT_IO_HIST_BUCKETS]; } PgStat_PendingIO; +extern PgStat_PendingIO PendingIOStats; + typedef struct PgStat_IO { TimestampTz stat_reset_timestamp; diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl index cdc427dbc78..33939c8701a 100644 --- a/src/test/recovery/t/029_stats_restart.pl +++ b/src/test/recovery/t/029_stats_restart.pl @@ -293,7 +293,36 @@ cmp_ok( $wal_restart_immediate->{reset}, "$sect: reset timestamp is new"); + +## Test pg_stat_io_histogram that is becoming active due to dynamic memory +## allocation only for new backends with globally set track_[io|wal_io]_timing +$sect = "pg_stat_io_histogram"; +$node->append_conf('postgresql.conf', "track_io_timing = 'on'"); +$node->append_conf('postgresql.conf', "track_wal_io_timing = 'on'"); +$node->restart; + + +## Check that pg_stat_io_histograms sees some growing counts in buckets +## We could also try with checkpointer, but it often runs with fsync=off +## during test. +my $countbefore = $node->safe_psql('postgres', + "SELECT sum(bucket_count) AS hist_bucket_count_sum FROM pg_stat_get_io_histogram() " . + "WHERE backend_type='client backend' AND object='relation' AND context='normal'"); + +$node->safe_psql('postgres', "CREATE TABLE test_io_hist(id bigint);"); +$node->safe_psql('postgres', "INSERT INTO test_io_hist SELECT generate_series(1, 100) s;"); +$node->safe_psql('postgres', "SELECT pg_stat_force_next_flush();"); + +my $countafter = $node->safe_psql('postgres', + "SELECT sum(bucket_count) AS hist_bucket_count_sum FROM pg_stat_get_io_histogram() " . + "WHERE backend_type='client backend' AND object='relation' AND context='normal'"); + +cmp_ok( + $countafter, '>', $countbefore, + "pg_stat_io_histogram: latency buckets growing"); + $node->stop; + done_testing(); sub trigger_funcrel_stat diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 4c95f09d651..cd00f35bf7a 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -1765,29 +1765,6 @@ SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset; t (1 row) --- Check that pg_stat_io_histograms sees some growing counts in buckets --- We could also try with checkpointer, but it often runs with fsync=off --- during test. -SET track_io_timing TO 'on'; -SELECT sum(bucket_count) AS hist_bucket_count_sum FROM pg_stat_get_io_histogram() -WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset -CREATE TABLE test_io_hist(id bigint); -INSERT INTO test_io_hist SELECT generate_series(1, 100) s; -SELECT pg_stat_force_next_flush(); - pg_stat_force_next_flush --------------------------- - -(1 row) - -SELECT sum(bucket_count) AS hist_bucket_count_sum2 FROM pg_stat_get_io_histogram() -WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset -SELECT :hist_bucket_count_sum2 > :hist_bucket_count_sum; - ?column? ----------- - t -(1 row) - -RESET track_io_timing; -- Check invalid input for pg_stat_get_backend_io() SELECT pg_stat_get_backend_io(NULL); pg_stat_get_backend_io diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index 063b1011d7e..8768e0f27fd 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -841,21 +841,6 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + FROM pg_stat_get_backend_io(pg_backend_pid()) \gset SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset; - --- Check that pg_stat_io_histograms sees some growing counts in buckets --- We could also try with checkpointer, but it often runs with fsync=off --- during test. -SET track_io_timing TO 'on'; -SELECT sum(bucket_count) AS hist_bucket_count_sum FROM pg_stat_get_io_histogram() -WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset -CREATE TABLE test_io_hist(id bigint); -INSERT INTO test_io_hist SELECT generate_series(1, 100) s; -SELECT pg_stat_force_next_flush(); -SELECT sum(bucket_count) AS hist_bucket_count_sum2 FROM pg_stat_get_io_histogram() -WHERE backend_type='client backend' AND object='relation' AND context='normal' \gset -SELECT :hist_bucket_count_sum2 > :hist_bucket_count_sum; -RESET track_io_timing; - -- Check invalid input for pg_stat_get_backend_io() SELECT pg_stat_get_backend_io(NULL); SELECT pg_stat_get_backend_io(0); -- 2.43.0
From 73df7dd739cb2fd98e3745dbd4f65290531a262b Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Fri, 6 Mar 2026 13:29:40 +0100 Subject: [PATCH v7 5/6] Condense PgStat_IO.stats[BACKEND_NUM_TYPES] array by using PGSTAT_USED_BACKEND_NUM_TYPES to be more memory efficient. --- src/backend/utils/activity/pgstat_io.c | 57 +++++++++++++++++++++++--- src/backend/utils/adt/pgstatfuncs.c | 22 ++++++---- src/include/miscadmin.h | 2 +- src/include/pgstat.h | 5 ++- 4 files changed, 71 insertions(+), 15 deletions(-) diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index 8605ea65605..1e9bff4da41 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -225,13 +225,14 @@ pgstat_io_flush_cb(bool nowait) { LWLock *bktype_lock; PgStat_BktypeIO *bktype_shstats; + BackendType condensedBkType = pgstat_remap_condensed_bktype(MyBackendType); if (!have_iostats) return false; bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType]; bktype_shstats = - &pgStatLocal.shmem->io.stats.stats[MyBackendType]; + &pgStatLocal.shmem->io.stats.stats[condensedBkType]; if (!nowait) LWLockAcquire(bktype_lock, LW_EXCLUSIVE); @@ -360,7 +361,11 @@ pgstat_io_reset_all_cb(TimestampTz ts) for (int i = 0; i < BACKEND_NUM_TYPES; i++) { LWLock *bktype_lock = &pgStatLocal.shmem->io.locks[i]; - PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i]; + BackendType bktype = pgstat_remap_condensed_bktype(i); + PgStat_BktypeIO *bktype_shstats; + if(bktype == -1) + continue; + bktype_shstats = &pgStatLocal.shmem->io.stats.stats[bktype]; LWLockAcquire(bktype_lock, LW_EXCLUSIVE); @@ -386,8 +391,13 @@ pgstat_io_snapshot_cb(void) for (int i = 0; i < BACKEND_NUM_TYPES; i++) { LWLock *bktype_lock = &pgStatLocal.shmem->io.locks[i]; - PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i]; - PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io->stats[i]; + BackendType bktype = pgstat_remap_condensed_bktype(i); + PgStat_BktypeIO *bktype_shstats; + PgStat_BktypeIO *bktype_snap; + if(bktype == -1) + continue; + bktype_shstats = &pgStatLocal.shmem->io.stats.stats[bktype]; + bktype_snap = &pgStatLocal.snapshot.io->stats[bktype]; LWLockAcquire(bktype_lock, LW_SHARED); @@ -419,7 +429,8 @@ pgstat_io_snapshot_cb(void) * Function returns true if BackendType participates in the cumulative stats * subsystem for IO and false if it does not. * -* When adding a new BackendType, also consider adding relevant restrictions to +* When adding a new BackendType, ensure that pgstat_remap_condensed_bktype() +* is updated and also consider adding relevant restrictions to * pgstat_tracks_io_object() and pgstat_tracks_io_op(). */ bool @@ -457,6 +468,42 @@ pgstat_tracks_io_bktype(BackendType bktype) return false; } + +/* + * Remap sparse backend type IDs to contiguous ones. Keep in sync with enum + * BackendType and PGSTAT_USED_BACKEND_NUM_TYPES count. + * + * Returns -1 if the input ID is invalid or unused. + */ +int +pgstat_remap_condensed_bktype(BackendType bktype) { + /* -1 here means it should not be used */ + static const int mapping_table[BACKEND_NUM_TYPES] = { + -1, /* B_INVALID */ + 0, + -1, /* B_DEAD_END_BACKEND */ + 1, + 2, + 3, + 4, + 5, + 6, + -1, /* B_ARCHIVER */ + 7, + 8, + 8, + 10, + 11, + 12, + 13, + -1 /* B_LOGGER */ + }; + + if (bktype < 0 || bktype > BACKEND_NUM_TYPES) + return -1; + return mapping_table[bktype]; +} + /* * Some BackendTypes do not perform IO on certain IOObjects or in certain * IOContexts. Some IOObjects are never operated on in some IOContexts. Check diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index ac08ab14195..74f1351289a 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1578,9 +1578,12 @@ pg_stat_get_io(PG_FUNCTION_ARGS) backends_io_stats = pgstat_fetch_stat_io(); - for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++) + for (int i = 0; i < BACKEND_NUM_TYPES; i++) { + BackendType bktype = pgstat_remap_condensed_bktype(i); PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype]; + if(bktype == -1) + continue; /* * In Assert builds, we can afford an extra loop through all of the @@ -1588,17 +1591,17 @@ pg_stat_get_io(PG_FUNCTION_ARGS) * expected stats are non-zero, since it keeps the non-Assert code * cleaner. */ - Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype)); + Assert(pgstat_bktype_io_stats_valid(bktype_stats, i)); /* * For those BackendTypes without IO Operation stats, skip * representing them in the view altogether. */ - if (!pgstat_tracks_io_bktype(bktype)) + if (!pgstat_tracks_io_bktype(i)) continue; /* save tuples with data from this PgStat_BktypeIO */ - pg_stat_io_build_tuples(rsinfo, bktype_stats, bktype, + pg_stat_io_build_tuples(rsinfo, bktype_stats, i, backends_io_stats->stat_reset_timestamp); } @@ -1757,9 +1760,12 @@ pg_stat_get_io_histogram(PG_FUNCTION_ARGS) backends_io_stats = pgstat_fetch_stat_io(); - for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++) + for (int i = 0; i < BACKEND_NUM_TYPES; i++) { + BackendType bktype = pgstat_remap_condensed_bktype(i); PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype]; + if(bktype == -1) + continue; /* * In Assert builds, we can afford an extra loop through all of the @@ -1767,17 +1773,17 @@ pg_stat_get_io_histogram(PG_FUNCTION_ARGS) * expected stats are non-zero, since it keeps the non-Assert code * cleaner. */ - Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype)); + Assert(pgstat_bktype_io_stats_valid(bktype_stats, i)); /* * For those BackendTypes without IO Operation stats, skip * representing them in the view altogether. */ - if (!pgstat_tracks_io_bktype(bktype)) + if (!pgstat_tracks_io_bktype(i)) continue; /* save tuples with data from this PgStat_BktypeIO */ - pg_stat_io_histogram_build_tuples(rsinfo, bktype_stats, bktype, + pg_stat_io_histogram_build_tuples(rsinfo, bktype_stats, i, backends_io_stats->stat_reset_timestamp); } diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index f16f35659b9..d0c62d3248e 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -332,7 +332,7 @@ extern void SwitchBackToLocalLatch(void); * MyBackendType indicates what kind of a backend this is. * * If you add entries, please also update the child_process_kinds array in - * launch_backend.c. + * launch_backend.c and PGSTAT_USED_BACKEND_NUM_TYPES in pgstat.h */ typedef enum BackendType { diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 59114f1bc3f..22114d378bd 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -359,10 +359,12 @@ typedef struct PgStat_PendingIO extern PgStat_PendingIO PendingIOStats; +/* This needs to stay in sync with pgstat_tracks_io_bktype() */ +#define PGSTAT_USED_BACKEND_NUM_TYPES BACKEND_NUM_TYPES - 4 typedef struct PgStat_IO { TimestampTz stat_reset_timestamp; - PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; + PgStat_BktypeIO stats[PGSTAT_USED_BACKEND_NUM_TYPES]; } PgStat_IO; typedef struct PgStat_StatDBEntry @@ -639,6 +641,7 @@ extern const char *pgstat_get_io_context_name(IOContext io_context); extern const char *pgstat_get_io_object_name(IOObject io_object); extern const char *pgstat_get_io_op_name(IOOp io_op); +extern int pgstat_remap_condensed_bktype(BackendType bktype); extern bool pgstat_tracks_io_bktype(BackendType bktype); extern bool pgstat_tracks_io_object(BackendType bktype, IOObject io_object, IOContext io_context); -- 2.43.0
From fc233581899610e5b96b0f561dd74bd60eac17e3 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Fri, 6 Mar 2026 14:00:38 +0100 Subject: [PATCH v7 6/6] Further condense and reduce memory used by pgstat_io(_histogram) subsystem by eliminating tracking of useless backend types: autovacum launcher and standalone backend. --- src/backend/utils/activity/pgstat_io.c | 17 +++++++++++------ src/include/pgstat.h | 2 +- src/test/recovery/t/029_stats_restart.pl | 5 ----- src/test/regress/expected/stats.out | 14 +------------- 4 files changed, 13 insertions(+), 25 deletions(-) diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index 1e9bff4da41..6c11430ad94 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -73,6 +73,8 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op, Assert((unsigned int) io_object < IOOBJECT_NUM_TYPES); Assert((unsigned int) io_context < IOCONTEXT_NUM_TYPES); Assert(pgstat_is_ioop_tracked_in_bytes(io_op) || bytes == 0); + if(unlikely(MyBackendType == B_STANDALONE_BACKEND || MyBackendType == B_AUTOVAC_LAUNCHER)) + return; Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op)); PendingIOStats.counts[io_object][io_context][io_op] += cnt; @@ -425,6 +427,9 @@ pgstat_io_snapshot_cb(void) * - Syslogger because it is not connected to shared memory * - Archiver because most relevant archiving IO is delegated to a * specialized command or module +* - Autovacum launcher because it hardly performs any IO +* - Standalone backend as it is only used in unusual maintenance +* scenarios * * Function returns true if BackendType participates in the cumulative stats * subsystem for IO and false if it does not. @@ -446,9 +451,10 @@ pgstat_tracks_io_bktype(BackendType bktype) case B_DEAD_END_BACKEND: case B_ARCHIVER: case B_LOGGER: + case B_AUTOVAC_LAUNCHER: + case B_STANDALONE_BACKEND: return false; - case B_AUTOVAC_LAUNCHER: case B_AUTOVAC_WORKER: case B_BACKEND: case B_BG_WORKER: @@ -456,7 +462,6 @@ pgstat_tracks_io_bktype(BackendType bktype) case B_CHECKPOINTER: case B_IO_WORKER: case B_SLOTSYNC_WORKER: - case B_STANDALONE_BACKEND: case B_STARTUP: case B_WAL_RECEIVER: case B_WAL_SENDER: @@ -482,20 +487,20 @@ pgstat_remap_condensed_bktype(BackendType bktype) { -1, /* B_INVALID */ 0, -1, /* B_DEAD_END_BACKEND */ + -1, /* B_AUTOVAC_LAUNCHER */ 1, 2, 3, 4, + -1, /* B_STANDALONE_BACKEND */ + -1, /* B_ARCHIVER */ 5, 6, - -1, /* B_ARCHIVER */ 7, 8, - 8, + 9, 10, 11, - 12, - 13, -1 /* B_LOGGER */ }; diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 22114d378bd..80476eda514 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -360,7 +360,7 @@ typedef struct PgStat_PendingIO extern PgStat_PendingIO PendingIOStats; /* This needs to stay in sync with pgstat_tracks_io_bktype() */ -#define PGSTAT_USED_BACKEND_NUM_TYPES BACKEND_NUM_TYPES - 4 +#define PGSTAT_USED_BACKEND_NUM_TYPES BACKEND_NUM_TYPES - 6 typedef struct PgStat_IO { TimestampTz stat_reset_timestamp; diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl index 33939c8701a..681fb9ac16d 100644 --- a/src/test/recovery/t/029_stats_restart.pl +++ b/src/test/recovery/t/029_stats_restart.pl @@ -22,12 +22,7 @@ my $sect = "startup"; # Check some WAL statistics after a fresh startup. The startup process # should have done WAL reads, and initialization some WAL writes. -my $standalone_io_stats = io_stats('init', 'wal', 'standalone backend'); my $startup_io_stats = io_stats('normal', 'wal', 'startup'); -cmp_ok( - '0', '<', - $standalone_io_stats->{writes}, - "$sect: increased standalone backend IO writes"); cmp_ok( '0', '<', $startup_io_stats->{reads}, diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index cd00f35bf7a..7cefd37a99a 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -16,11 +16,6 @@ SHOW track_counts; -- must be on SELECT backend_type, object, context FROM pg_stat_io ORDER BY backend_type COLLATE "C", object COLLATE "C", context COLLATE "C"; backend_type|object|context -autovacuum launcher|relation|bulkread -autovacuum launcher|relation|init -autovacuum launcher|relation|normal -autovacuum launcher|wal|init -autovacuum launcher|wal|normal autovacuum worker|relation|bulkread autovacuum worker|relation|init autovacuum worker|relation|normal @@ -67,13 +62,6 @@ slotsync worker|relation|vacuum slotsync worker|temp relation|normal slotsync worker|wal|init slotsync worker|wal|normal -standalone backend|relation|bulkread -standalone backend|relation|bulkwrite -standalone backend|relation|init -standalone backend|relation|normal -standalone backend|relation|vacuum -standalone backend|wal|init -standalone backend|wal|normal startup|relation|bulkread startup|relation|bulkwrite startup|relation|init @@ -95,7 +83,7 @@ walsummarizer|wal|init walsummarizer|wal|normal walwriter|wal|init walwriter|wal|normal -(79 rows) +(67 rows) \a -- ensure that both seqscan and indexscan plans are allowed SET enable_seqscan TO on; -- 2.43.0
