On Mon, Apr 24, 2023 at 7:02 PM Melanie Plageman <melanieplage...@gmail.com> wrote:
> On Mon, Apr 24, 2023 at 03:56:54PM -0700, Andres Freund wrote: > > > > I was thinking we'd track writeback separately from the write, rather > than > > attributing the writeback to "write". Otherwise it looks good, based on > a > > quick skim. > > Like you want a separate IOOp IOOP_WRITEBACK? Interesting. Okay. > Okay, attached v2 does this (adds IOOP_WRITEBACK). With my patch applied and the same pgbench setup as you (for -T30): After pgbench: backend_type | object | context | writes | write_time | writebacks | writeback_time | fsyncs | fsync_time | ---------------------+---------------+-----------+----------+------------+------------+----------------+---------+------------+ background writer | relation | normal | 5581 | 23.416 | 5568 | 32.33 | 0 | 0 | checkpointer | relation | normal | 89116 | 295.576 | 89106 | 416.5 | 84 | 5242.764 | and then after a stats reset followed by an explicit checkpoint: backend_type | object | context | writes | write_time | writebacks | writeback_time | fsyncs | fsync_time | ---------------------+---------------+-----------+---------+--------------------+------------+----------------+---------+------------+ checkpointer | relation | normal | 229807 | 457.43600000000004 | 229817 | 532.84 | 52 | 378.652 | I've yet to cook up a client backend test case (e.g. with COPY). I've taken that as a todo. I have a few outstanding questions: 1) Does it make sense for writebacks to count the number of blocks for which writeback was requested or the number of calls to smgrwriteback() or the number of actual syscalls made? We don't actually know from outside of mdwriteback() how many FileWriteback() calls we will make. 2) I'm a little nervous about not including IOObject in the writeback context. Technically, there is nothing stopping local buffer code from calling IssuePendingWritebacks(). Right now, local buffer code doesn't do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to hardcode in IOOBJECT_RELATION when there is nothing wrong with requesting writeback of local buffers (AFAIK). What do you think? 3) Should any restrictions be added to pgstat_tracks_io_object() or pgstat_tracks_io_op()? I couldn't think of any backend types or IO contexts which would not do writeback as a rule. Also, though we don't do writeback for temp tables now, it isn't nonsensical to do so. In this version, I didn't add any restrictions. Docs need work. I added a placeholder for the new columns. I'll update it once we decide what writebacks should actually count. And, I don't think we can do any kind of ongoing test. - Melanie
From b20765526e5c4bb5a82734ad9977552ebf9dad2f Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Mon, 24 Apr 2023 18:21:54 -0400 Subject: [PATCH v2] Add writeback to pg_stat_io --- doc/src/sgml/monitoring.sgml | 24 ++++++++++++++++++++++++ src/backend/catalog/system_views.sql | 2 ++ src/backend/postmaster/bgwriter.c | 4 ++-- src/backend/storage/buffer/buf_init.c | 2 +- src/backend/storage/buffer/bufmgr.c | 12 ++++++++++-- src/backend/utils/adt/pgstatfuncs.c | 5 +++++ src/include/catalog/pg_proc.dat | 6 +++--- src/include/pgstat.h | 3 ++- src/include/storage/buf_internals.h | 4 +++- src/test/regress/expected/rules.out | 4 +++- 10 files changed, 55 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 99f7f95c39..62572327cf 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3867,6 +3867,30 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </entry> </row> + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>writebacks</structfield> <type>bigint</type> + </para> + <para> + Number of writeback operations, each of the size specified in + <varname>op_bytes</varname>. + </para> + </entry> + </row> + + <row> + <entry role="catalog_table_entry"> + <para role="column_definition"> + <structfield>writeback_time</structfield> <type>double precision</type> + </para> + <para> + Time spent in writeback operations in milliseconds (if + <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero) + </para> + </entry> + </row> + <row> <entry role="catalog_table_entry"> <para role="column_definition"> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 48aacf66ee..d0c932ad0e 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1131,6 +1131,8 @@ SELECT b.read_time, b.writes, b.write_time, + b.writebacks, + b.writeback_time, b.extends, b.extend_time, b.op_bytes, diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index caad642ec9..4c3540d6e1 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -129,7 +129,7 @@ BackgroundWriterMain(void) ALLOCSET_DEFAULT_SIZES); MemoryContextSwitchTo(bgwriter_context); - WritebackContextInit(&wb_context, &bgwriter_flush_after); + WritebackContextInit(&wb_context, IOCONTEXT_NORMAL, &bgwriter_flush_after); /* * If an exception is encountered, processing resumes here. @@ -185,7 +185,7 @@ BackgroundWriterMain(void) MemoryContextResetAndDeleteChildren(bgwriter_context); /* re-initialize to avoid repeated errors causing problems */ - WritebackContextInit(&wb_context, &bgwriter_flush_after); + WritebackContextInit(&wb_context, IOCONTEXT_NORMAL, &bgwriter_flush_after); /* Now we can allow interrupts again */ RESUME_INTERRUPTS(); diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index 0057443f0c..1d80532f3b 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -146,7 +146,7 @@ InitBufferPool(void) StrategyInitialize(!foundDescs); /* Initialize per-backend file flush context */ - WritebackContextInit(&BackendWritebackContext, + WritebackContextInit(&BackendWritebackContext, IOCONTEXT_NORMAL, &backend_flush_after); } diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 1fa689052e..188e135feb 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1685,6 +1685,7 @@ again: FlushBuffer(buf_hdr, NULL, IOOBJECT_RELATION, io_context); LWLockRelease(content_lock); + BackendWritebackContext.io_context = io_context; ScheduleBufferTagForWriteback(&BackendWritebackContext, &buf_hdr->tag); } @@ -2555,7 +2556,7 @@ BufferSync(int flags) if (num_to_scan == 0) return; /* nothing to do */ - WritebackContextInit(&wb_context, &checkpoint_flush_after); + WritebackContextInit(&wb_context, IOCONTEXT_NORMAL, &checkpoint_flush_after); TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_scan); @@ -5433,10 +5434,11 @@ ts_ckpt_progress_comparator(Datum a, Datum b, void *arg) * writeback control will be performed. */ void -WritebackContextInit(WritebackContext *context, int *max_pending) +WritebackContextInit(WritebackContext *context, IOContext io_context, int *max_pending) { Assert(*max_pending <= WRITEBACK_MAX_PENDING_FLUSHES); + context->io_context = io_context; context->max_pending = max_pending; context->nr_pending = 0; } @@ -5491,6 +5493,7 @@ ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag *tag) void IssuePendingWritebacks(WritebackContext *context) { + instr_time io_start; int i; if (context->nr_pending == 0) @@ -5502,6 +5505,8 @@ IssuePendingWritebacks(WritebackContext *context) */ sort_pending_writebacks(context->pending_writebacks, context->nr_pending); + io_start = pgstat_prepare_io_time(); + /* * Coalesce neighbouring writes, but nothing else. For that we iterate * through the, now sorted, array of pending flushes, and look forward to @@ -5555,6 +5560,9 @@ IssuePendingWritebacks(WritebackContext *context) smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks); } + pgstat_count_io_op_time(IOOBJECT_RELATION, context->io_context, IOOP_WRITEBACK, + io_start, context->nr_pending); + context->nr_pending = 0; } diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 0d57b6a7c0..70da0a2de1 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1268,6 +1268,8 @@ typedef enum io_stat_col IO_COL_READ_TIME, IO_COL_WRITES, IO_COL_WRITE_TIME, + IO_COL_WRITEBACKS, + IO_COL_WRITEBACK_TIME, IO_COL_EXTENDS, IO_COL_EXTEND_TIME, IO_COL_CONVERSION, @@ -1303,6 +1305,8 @@ pgstat_get_io_op_index(IOOp io_op) return IO_COL_REUSES; case IOOP_WRITE: return IO_COL_WRITES; + case IOOP_WRITEBACK: + return IO_COL_WRITEBACKS; } elog(ERROR, "unrecognized IOOp value: %d", io_op); @@ -1322,6 +1326,7 @@ pgstat_get_io_time_index(IOOp io_op) { case IOOP_READ: case IOOP_WRITE: + case IOOP_WRITEBACK: case IOOP_EXTEND: case IOOP_FSYNC: return pgstat_get_io_op_index(io_op) + 1; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index b2bc81b15f..e766b9b600 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5760,9 +5760,9 @@ proname => 'pg_stat_get_io', provolatile => 'v', prorows => '30', proretset => 't', proparallel => 'r', prorettype => 'record', proargtypes => '', - proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}', - proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', + proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}', + proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', + proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}', prosrc => 'pg_stat_get_io' }, { oid => '1136', descr => 'statistics: information about WAL activity', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index fff4ad5b6d..57a2c0866a 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -300,9 +300,10 @@ typedef enum IOOp IOOP_READ, IOOP_REUSE, IOOP_WRITE, + IOOP_WRITEBACK, } IOOp; -#define IOOP_NUM_TYPES (IOOP_WRITE + 1) +#define IOOP_NUM_TYPES (IOOP_WRITEBACK + 1) typedef struct PgStat_BktypeIO { diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index 136cf8fbaf..331ca26ce3 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -295,6 +295,8 @@ typedef struct PendingWriteback /* struct forward declared in bufmgr.h */ typedef struct WritebackContext { + IOContext io_context; + /* pointer to the max number of writeback requests to coalesce */ int *max_pending; @@ -387,7 +389,7 @@ extern PGDLLIMPORT CkptSortItem *CkptBufferIds; * Internal buffer management routines */ /* bufmgr.c */ -extern void WritebackContextInit(WritebackContext *context, int *max_pending); +extern void WritebackContextInit(WritebackContext *context, IOContext io_context, int *max_pending); extern void IssuePendingWritebacks(WritebackContext *context); extern void ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag *tag); diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 69957687fe..0bd6db2bf8 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1887,6 +1887,8 @@ pg_stat_io| SELECT backend_type, read_time, writes, write_time, + writebacks, + writeback_time, extends, extend_time, op_bytes, @@ -1896,7 +1898,7 @@ pg_stat_io| SELECT backend_type, fsyncs, fsync_time, stats_reset - FROM pg_stat_get_io() b(backend_type, object, context, reads, read_time, writes, write_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset); + FROM pg_stat_get_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset); pg_stat_progress_analyze| SELECT s.pid, s.datid, d.datname, -- 2.37.2