Hi, On Sun, 31 Dec 2023 at 03:58, Michael Paquier <mich...@paquier.xyz> wrote: > > On Tue, Dec 26, 2023 at 03:35:52PM +0300, Nazir Bilal Yavuz wrote: > > On Tue, 26 Dec 2023 at 13:10, Michael Paquier <mich...@paquier.xyz> wrote: > >> I am not sure while the whole point of the exercise is to have all the > >> I/O related data in a single view. Something that I've also found a > >> bit disturbing yesterday while looking at your patch is the fact that > >> the operation size is guessed from the context and object type when > >> querying the view because now everything is tied to BLCKSZ. This > >> patch extends it with two more operation sizes, and there are even > >> cases where it may be a variable. Could it be a better option to > >> extend pgstat_count_io_op_time() so as callers can themselves give the > >> size of the operation? > > > > Do you mean removing the op_bytes column and tracking the number of > > bytes in reads, writes, and extends? If so, that makes sense to me but > > I don't want to remove the number of operations; I believe that has a > > value too. We can extend the pgstat_count_io_op_time() so it can both > > track the number of bytes and the number of operations. > > Apologies if my previous wording sounded confusing. The idea I had in > mind was to keep op_bytes in pg_stat_io, and extend it so as a value > of NULL (or 0, or -1) is a synonym as "writes", "extends" and "reads" > as a number of bytes.
Oh, I understand it now. Yes, that makes sense. I thought removing op_bytes completely ( as you said "This patch extends it with two more operation sizes, and there are even cases where it may be a variable" ) from pg_stat_io view then adding something like {read | write | extend}_bytes and {read | write | extend}_calls could be better, so that we don't lose any information. > > Also, it is not directly related to this patch but vectored IO [1] is > > coming soon; so the number of operations could be wrong since vectored > > IO could merge a couple of operations. > > Hmm. I have not checked this patch series so I cannot say for sure, > but we'd likely just want to track the number of bytes if a single > operation has a non-equal size rather than registering in pg_stat_io N > rows with different op_bytes, no? Yes, that is correct. > I am looping in Thomas Munro in CC for comments. Thanks for doing that. > > If we want to entirely disable it, we can add > > > > if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL) > > return; > > > > to the top of the pgstat_count_io_op_time() since all IOOBJECT_WAL > > calls are done by this function, then we can disable it at > > pgstat_tracks_io_bktype(). > > Yeah, a limitation like that may be acceptable for now. Tracking the > WAL writer and WAL sender activities can be relevant in a lot of cases > even if we don't have the full picture for the WAL receiver yet. I added that and disabled B_WAL_RECEIVER backend with comments explaining why. v8 is attached. -- Regards, Nazir Bilal Yavuz Microsoft
From dc25cccc8a5b168f15c9771f174924261b151193 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <byavu...@gmail.com> Date: Wed, 3 Jan 2024 15:36:19 +0300 Subject: [PATCH v8] Show WAL stats on pg_stat_io (except streaming replication) This patch aims to showing WAL stats per backend on pg_stat_io view. With this patch, it can be seen how many WAL operations it makes, their context, types and total timings per backend in pg_stat_io view. In this path new IOContext IOCONTEXT_INIT is introduced, it is for IO operations done while creating the things. Currently, it is used only together with IOObject IOOBJECT_WAL. IOOBJECT_WAL means IO operations related to WAL. IOOBJECT_WAL / IOCONTEXT_NORMAL means IO operations done on already created WAL segments. IOOBJECT_WAL / IOCONTEXT_INIT means IO operations done while creating the WAL segments. This patch currently covers: - Documentation - IOOBJECT_WAL / IOCONTEXT_NORMAL / read, write and fsync stats on pg_stat_io. - IOOBJECT_WAL / IOCONTEXT_INIT / write and fsync stats on pg_stat_io. doesn't cover: - Streaming replication WAL IO. --- src/include/catalog/pg_proc.dat | 6 +- src/include/pgstat.h | 6 +- src/backend/access/transam/xlog.c | 58 +++++------ src/backend/access/transam/xlogrecovery.c | 10 ++ src/backend/catalog/system_views.sql | 15 ++- src/backend/utils/activity/pgstat_io.c | 119 ++++++++++++++++++++-- src/backend/utils/adt/pgstatfuncs.c | 24 ++--- src/test/regress/expected/rules.out | 27 +++-- src/test/regress/expected/stats.out | 53 ++++++++++ src/test/regress/sql/stats.sql | 25 +++++ doc/src/sgml/monitoring.sgml | 29 ++++-- 11 files changed, 292 insertions(+), 80 deletions(-) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 5b67784731a..26197dbb817 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5785,9 +5785,9 @@ { oid => '1136', descr => 'statistics: information about WAL activity', proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's', proparallel => 'r', prorettype => 'record', proargtypes => '', - proallargtypes => '{int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}', - proargmodes => '{o,o,o,o,o,o,o,o,o}', - proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}', + proallargtypes => '{int8,int8,numeric,int8,int8,int8,timestamptz}', + proargmodes => '{o,o,o,o,o,o,o}', + proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,stats_reset}', prosrc => 'pg_stat_get_wal' }, { oid => '6248', descr => 'statistics: information about WAL prefetching', proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index ab91b3b367d..c23970da49f 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -279,14 +279,16 @@ typedef enum IOObject { IOOBJECT_RELATION, IOOBJECT_TEMP_RELATION, + IOOBJECT_WAL, } IOObject; -#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1) +#define IOOBJECT_NUM_TYPES (IOOBJECT_WAL + 1) typedef enum IOContext { IOCONTEXT_BULKREAD, IOCONTEXT_BULKWRITE, + IOCONTEXT_INIT, IOCONTEXT_NORMAL, IOCONTEXT_VACUUM, } IOContext; @@ -527,6 +529,8 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, instr_time start_time, uint32 cnt); extern PgStat_IO *pgstat_fetch_stat_io(void); +extern int pgstat_get_io_op_bytes(IOObject io_object, + IOContext io_context); extern const char *pgstat_get_io_context_name(IOContext io_context); extern const char *pgstat_get_io_object_name(IOObject io_object); diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1264849883b..84b6ca9d1d8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2281,38 +2281,22 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) Size nbytes; Size nleft; ssize_t written; - instr_time start; + instr_time io_start; /* OK to write the page(s) */ from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ; nbytes = npages * (Size) XLOG_BLCKSZ; nleft = nbytes; + + io_start = pgstat_prepare_io_time(track_wal_io_timing); do { errno = 0; - /* Measure I/O timing to write WAL data */ - if (track_wal_io_timing) - INSTR_TIME_SET_CURRENT(start); - else - INSTR_TIME_SET_ZERO(start); - pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); written = pg_pwrite(openLogFile, from, nleft, startoffset); pgstat_report_wait_end(); - /* - * Increment the I/O timing and the number of times WAL data - * were written out to disk. - */ - if (track_wal_io_timing) - { - instr_time end; - - INSTR_TIME_SET_CURRENT(end); - INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start); - } - PendingWalStats.wal_write++; if (written <= 0) @@ -2337,6 +2321,9 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) startoffset += written; } while (nleft > 0); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, + IOOP_WRITE, io_start, npages); + npages = 0; /* @@ -3043,6 +3030,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, int fd; int save_errno; int open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY; + instr_time io_start; Assert(logtli != 0); @@ -3086,6 +3074,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, (errcode_for_file_access(), errmsg("could not create file \"%s\": %m", tmppath))); + /* start timing writes for stats */ + io_start = pgstat_prepare_io_time(track_wal_io_timing); + pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE); save_errno = 0; if (wal_init_zero) @@ -3121,6 +3112,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, } pgstat_report_wait_end(); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE, + io_start, 1); + if (save_errno) { /* @@ -3137,6 +3131,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, errmsg("could not write to file \"%s\": %m", tmppath))); } + /* start timing fsyncs for stats */ + io_start = pgstat_prepare_io_time(track_wal_io_timing); + pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC); if (pg_fsync(fd) != 0) { @@ -3149,6 +3146,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, } pgstat_report_wait_end(); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, + IOOP_FSYNC, io_start, 1); + if (close(fd) != 0) ereport(ERROR, (errcode_for_file_access(), @@ -8414,7 +8414,7 @@ void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) { char *msg = NULL; - instr_time start; + instr_time io_start; Assert(tli != 0); @@ -8427,11 +8427,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC) return; - /* Measure I/O timing to sync the WAL file */ - if (track_wal_io_timing) - INSTR_TIME_SET_CURRENT(start); - else - INSTR_TIME_SET_ZERO(start); + io_start = pgstat_prepare_io_time(track_wal_io_timing); pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC); switch (wal_sync_method) @@ -8475,16 +8471,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) pgstat_report_wait_end(); - /* - * Increment the I/O timing and the number of times WAL files were synced. - */ - if (track_wal_io_timing) - { - instr_time end; - - INSTR_TIME_SET_CURRENT(end); - INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start); - } + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, + io_start, 1); PendingWalStats.wal_sync++; } diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 6f4f81f9927..bb42263f2da 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -60,6 +60,7 @@ #include "utils/builtins.h" #include "utils/datetime.h" #include "utils/guc_hooks.h" +#include "utils/pgstat_internal.h" #include "utils/pg_lsn.h" #include "utils/ps_status.h" #include "utils/pg_rusage.h" @@ -1777,6 +1778,9 @@ PerformWalRecovery(void) */ ApplyWalRecord(xlogreader, record, &replayTLI); + /* Report pending statistics to the cumulative stats system */ + pgstat_flush_io(false); + /* Exit loop if we reached inclusive recovery target */ if (recoveryStopsAfter(xlogreader)) { @@ -3252,6 +3256,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, uint32 targetPageOff; XLogSegNo targetSegNo PG_USED_FOR_ASSERTS_ONLY; int r; + instr_time io_start; XLByteToSeg(targetPagePtr, targetSegNo, wal_segment_size); targetPageOff = XLogSegmentOffset(targetPagePtr, wal_segment_size); @@ -3344,6 +3349,8 @@ retry: /* Read the requested page */ readOff = targetPageOff; + io_start = pgstat_prepare_io_time(track_wal_io_timing); + pgstat_report_wait_start(WAIT_EVENT_WAL_READ); r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff); if (r != XLOG_BLCKSZ) @@ -3372,6 +3379,9 @@ retry: } pgstat_report_wait_end(); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_READ, + io_start, 1); + Assert(targetSegNo == readSegNo); Assert(targetPageOff == readOff); Assert(reqLen <= readLen); diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 058fc47c919..c57b5428ae5 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1172,6 +1172,13 @@ SELECT FROM pg_stat_get_io() b; CREATE VIEW pg_stat_wal AS + WITH pgsio_sum_write_fsync_time AS ( + SELECT + sum(write_time) as sum_write_time, + sum(fsync_time) as sum_fsync_time + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal' + ) SELECT w.wal_records, w.wal_fpi, @@ -1179,10 +1186,12 @@ CREATE VIEW pg_stat_wal AS w.wal_buffers_full, w.wal_write, w.wal_sync, - w.wal_write_time, - w.wal_sync_time, + p.sum_write_time as write_time, + p.sum_fsync_time as fsync_time, w.stats_reset - FROM pg_stat_get_wal() w; + FROM pg_stat_get_wal() w + CROSS JOIN pgsio_sum_write_fsync_time p; + CREATE VIEW pg_stat_progress_analyze AS SELECT diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index 0dd9b98b3e4..8953a705615 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -16,6 +16,7 @@ #include "postgres.h" +#include "access/xlog.h" #include "executor/instrument.h" #include "storage/bufmgr.h" #include "utils/pgstat_internal.h" @@ -115,6 +116,20 @@ pgstat_prepare_io_time(bool track_io_guc) return io_start; } +/* + * Decide if IO timings need to be tracked. Timings associated to + * IOOBJECT_WAL objects are tracked if track_wal_io_timing is enabled, + * else rely on track_io_timing. + */ +static bool +pgstat_should_track_io_time(IOObject io_object) +{ + if (io_object == IOOBJECT_WAL) + return track_wal_io_timing; + + return track_io_timing; +} + /* * Like pgstat_count_io_op_n() except it also accumulates time. */ @@ -122,7 +137,28 @@ void pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, instr_time start_time, uint32 cnt) { - if (track_io_timing) + /* + * B_WAL_RECEIVER backend does IOOBJECT_WAL IOObject & IOOP_READ IOOp IOs + * but these IOs are not countable for now because IOOP_READ IOs' op_bytes + * (number of bytes per unit of I/O) might not be the same all the time. + * The current implementation requires that the op_bytes must be the same + * for the same IOObject, IOContext and IOOp. To avoid confusion, the + * B_WAL_RECEIVER backend & IOOBJECT_WAL IOObject IOs are disabled for + * now. + */ + if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL) + return; + + /* + * Accumulate timing data. pgstat_count_buffer is for pgstat_database. As + * pg_stat_database only counts blk_read_time and blk_write_time, it is + * set for IOOP_READ and IOOP_WRITE. + * + * pgBufferUsage is for EXPLAIN. pgBufferUsage has write and read stats + * for shared, local and temporary blocks. Temporary blocks are ignored + * here. + */ + if (pgstat_should_track_io_time(io_object)) { instr_time io_time; @@ -131,7 +167,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) { - pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); + if (io_object != IOOBJECT_WAL) + pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); + if (io_object == IOOBJECT_RELATION) INSTR_TIME_ADD(pgBufferUsage.shared_blk_write_time, io_time); else if (io_object == IOOBJECT_TEMP_RELATION) @@ -139,7 +177,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, } else if (io_op == IOOP_READ) { - pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); + if (io_object != IOOBJECT_WAL) + pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); + if (io_object == IOOBJECT_RELATION) INSTR_TIME_ADD(pgBufferUsage.shared_blk_read_time, io_time); else if (io_object == IOOBJECT_TEMP_RELATION) @@ -226,6 +266,8 @@ pgstat_get_io_context_name(IOContext io_context) return "bulkread"; case IOCONTEXT_BULKWRITE: return "bulkwrite"; + case IOCONTEXT_INIT: + return "init"; case IOCONTEXT_NORMAL: return "normal"; case IOCONTEXT_VACUUM: @@ -236,6 +278,30 @@ pgstat_get_io_context_name(IOContext io_context) pg_unreachable(); } +/* + * Get the number of bytes associated to an operation depending on + * an IOObject and an IOContext. + * + * Returns BLCKSZ by default because that is what most of the + * IOObject/IOContext combinations use. The number of bytes for an + * IOOBJECT_WAL depends on the context. + */ +int +pgstat_get_io_op_bytes(IOObject io_object, IOContext io_context) +{ + if (io_object == IOOBJECT_WAL) + { + if (io_context == IOCONTEXT_NORMAL) + return XLOG_BLCKSZ; + else if (io_context == IOCONTEXT_INIT) + return wal_segment_size; + + Assert(false); /* not reachable */ + } + + return BLCKSZ; +} + const char * pgstat_get_io_object_name(IOObject io_object) { @@ -245,6 +311,8 @@ pgstat_get_io_object_name(IOObject io_object) return "relation"; case IOOBJECT_TEMP_RELATION: return "temp relation"; + case IOOBJECT_WAL: + return "wal"; } elog(ERROR, "unrecognized IOObject value: %d", io_object); @@ -306,8 +374,8 @@ pgstat_io_snapshot_cb(void) * - Syslogger because it is not connected to shared memory * - Archiver because most relevant archiving IO is delegated to a * specialized command or module -* - WAL Receiver, WAL Writer, and WAL Summarizer IO are not tracked in -* pg_stat_io for now +* - WAL Receiver and WAL Summarizer IO are not tracked in pg_stat_io for +* now. * * Function returns true if BackendType participates in the cumulative stats * subsystem for IO and false if it does not. @@ -328,7 +396,13 @@ pgstat_tracks_io_bktype(BackendType bktype) case B_ARCHIVER: case B_LOGGER: case B_WAL_RECEIVER: - case B_WAL_WRITER: + + /* + * B_WAL_RECEIVER can do IOs but it is disabled for now to avoid + * confusion. See comment at the top of the + * pgstat_count_io_op_time() function. + */ + case B_WAL_SUMMARIZER: return false; @@ -341,6 +415,7 @@ pgstat_tracks_io_bktype(BackendType bktype) case B_STANDALONE_BACKEND: case B_STARTUP: case B_WAL_SENDER: + case B_WAL_WRITER: return true; } @@ -366,6 +441,15 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object, if (!pgstat_tracks_io_bktype(bktype)) return false; + /* + * Currently, IO on IOOBJECT_WAL objects can only occur in the + * IOCONTEXT_NORMAL and IOCONTEXT_INIT IOContexts. + */ + if (io_object == IOOBJECT_WAL && + (io_context != IOCONTEXT_NORMAL && + io_context != IOCONTEXT_INIT)) + return false; + /* * Currently, IO on temporary relations can only occur in the * IOCONTEXT_NORMAL IOContext. @@ -440,6 +524,13 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object, bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND) return false; + /* + * Most BackendTypes don't do reads with IOOBJECT_WAL. + */ + if (io_object == IOOBJECT_WAL && io_op == IOOP_READ && + !(bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP)) + return false; + /* * Temporary tables are not logged and thus do not require fsync'ing. * Writeback is not requested for temporary tables. @@ -464,6 +555,22 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object, if (!strategy_io_context && io_op == IOOP_REUSE) return false; + /* + * An IOCONTEXT_INIT done for an IOOBJECT_WAL io_object does writes and + * syncs. + */ + if (io_object == IOOBJECT_WAL && io_context == IOCONTEXT_INIT && + !(io_op == IOOP_WRITE || io_op == IOOP_FSYNC)) + return false; + + /* + * An IOCONTEXT_NORMAL done for an IOOBJECT_WAL io_object does writes, + * reads and syncs. + */ + if (io_object == IOOBJECT_WAL && io_context == IOCONTEXT_NORMAL && + !(io_op == IOOP_WRITE || io_op == IOOP_READ || io_op == IOOP_FSYNC)) + return false; + /* * IOOP_FSYNC IOOps done by a backend using a BufferAccessStrategy are * counted in the IOCONTEXT_NORMAL IOContext. See comment in diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index e65cbf41e9f..a77c2ce0622 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1395,6 +1395,7 @@ pg_stat_get_io(PG_FUNCTION_ARGS) for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++) { const char *context_name = pgstat_get_io_context_name(io_context); + int op_bytes; Datum values[IO_NUM_COLUMNS] = {0}; bool nulls[IO_NUM_COLUMNS] = {0}; @@ -1413,12 +1414,11 @@ pg_stat_get_io(PG_FUNCTION_ARGS) values[IO_COL_RESET_TIME] = TimestampTzGetDatum(reset_time); /* - * Hard-code this to the value of BLCKSZ for now. Future - * values could include XLOG_BLCKSZ, once WAL IO is tracked, - * and constant multipliers, once non-block-oriented IO (e.g. - * temporary file IO) is tracked. + * op_bytes can change according to IOObject and IOContext. + * Get the correct op_bytes. */ - values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ); + op_bytes = pgstat_get_io_op_bytes(io_obj, io_context); + values[IO_COL_CONVERSION] = Int64GetDatum(op_bytes); for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++) { @@ -1470,7 +1470,7 @@ pg_stat_get_io(PG_FUNCTION_ARGS) Datum pg_stat_get_wal(PG_FUNCTION_ARGS) { -#define PG_STAT_GET_WAL_COLS 9 +#define PG_STAT_GET_WAL_COLS 7 TupleDesc tupdesc; Datum values[PG_STAT_GET_WAL_COLS] = {0}; bool nulls[PG_STAT_GET_WAL_COLS] = {0}; @@ -1491,11 +1491,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS) INT8OID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_sync", INT8OID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_time", - FLOAT8OID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time", - FLOAT8OID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset", + TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stats_reset", TIMESTAMPTZOID, -1, 0); BlessTupleDesc(tupdesc); @@ -1518,11 +1514,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS) values[4] = Int64GetDatum(wal_stats->wal_write); values[5] = Int64GetDatum(wal_stats->wal_sync); - /* Convert counters from microsec to millisec for display */ - values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0); - values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0); - - values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp); + values[6] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp); /* Returns the record as Datum */ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index f645e8486bf..1b9db4e45af 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -2225,16 +2225,23 @@ pg_stat_user_tables| SELECT relid, autoanalyze_count FROM pg_stat_all_tables WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text)); -pg_stat_wal| SELECT wal_records, - wal_fpi, - wal_bytes, - wal_buffers_full, - wal_write, - wal_sync, - wal_write_time, - wal_sync_time, - stats_reset - FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, wal_write_time, wal_sync_time, stats_reset); +pg_stat_wal| WITH pgsio_sum_write_fsync_time AS ( + SELECT sum(pg_stat_io.write_time) AS sum_write_time, + sum(pg_stat_io.fsync_time) AS sum_fsync_time + FROM pg_stat_io + WHERE ((pg_stat_io.context = 'normal'::text) AND (pg_stat_io.object = 'wal'::text)) + ) + SELECT w.wal_records, + w.wal_fpi, + w.wal_bytes, + w.wal_buffers_full, + w.wal_write, + w.wal_sync, + p.sum_write_time AS write_time, + p.sum_fsync_time AS fsync_time, + w.stats_reset + FROM (pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, stats_reset) + CROSS JOIN pgsio_sum_write_fsync_time p); pg_stat_wal_receiver| SELECT pid, status, receive_start_lsn, diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 346e10a3d2b..c566ddbdd42 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -862,6 +862,33 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid(); t (1 row) +-- Test pg_stat_io for WAL in an init context, that should do writes +-- and syncs. +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'init' and object = 'wal' \gset io_sum_wal_init_ +SELECT :io_sum_wal_init_writes > 0; + ?column? +---------- + t +(1 row) + +SELECT current_setting('fsync') = 'off' + OR :io_sum_wal_init_fsyncs > 0; + ?column? +---------- + t +(1 row) + +-- Test pg_stat_io for WAL in a normal context, that should do reads as well. +SELECT SUM(reads) > 0 + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal'; + ?column? +---------- + t +(1 row) + ----- -- Test that resetting stats works for reset timestamp ----- @@ -1255,6 +1282,7 @@ SELECT pg_stat_get_subscription_stats(NULL); -- - extends of relations using shared buffers -- - fsyncs done to ensure the durability of data dirtying shared buffers -- - shared buffer hits +-- - WAL writes and fsyncs in IOContext IOCONTEXT_NORMAL -- There is no test for blocks evicted from shared buffers, because we cannot -- be sure of the state of shared buffers at the point the test is run. -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL @@ -1264,6 +1292,9 @@ SELECT sum(extends) AS io_sum_shared_before_extends SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs FROM pg_stat_io WHERE object = 'relation' \gset io_sum_shared_before_ +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal' \gset io_sum_wal_normal_before_ CREATE TABLE test_io_shared(a int); INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i; SELECT pg_stat_force_next_flush(); @@ -1301,6 +1332,28 @@ SELECT current_setting('fsync') = 'off' t (1 row) +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal' \gset io_sum_wal_normal_after_ +SELECT current_setting('synchronous_commit') = 'on'; + ?column? +---------- + t +(1 row) + +SELECT :io_sum_wal_normal_after_writes > :io_sum_wal_normal_before_writes; + ?column? +---------- + t +(1 row) + +SELECT current_setting('fsync') = 'off' + OR :io_sum_wal_normal_after_fsyncs > :io_sum_wal_normal_before_fsyncs; + ?column? +---------- + t +(1 row) + -- Change the tablespace so that the table is rewritten directly, then SELECT -- from it to cause it to be read back into shared buffers. SELECT sum(reads) AS io_sum_shared_before_reads diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index e3b4ca96e89..aa6caf7fd7c 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -442,6 +442,20 @@ SELECT (current_schemas(true))[1] = ('pg_temp_' || beid::text) AS match FROM pg_stat_get_backend_idset() beid WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid(); +-- Test pg_stat_io for WAL in an init context, that should do writes +-- and syncs. +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'init' and object = 'wal' \gset io_sum_wal_init_ +SELECT :io_sum_wal_init_writes > 0; +SELECT current_setting('fsync') = 'off' + OR :io_sum_wal_init_fsyncs > 0; + +-- Test pg_stat_io for WAL in a normal context, that should do reads as well. +SELECT SUM(reads) > 0 + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal'; + ----- -- Test that resetting stats works for reset timestamp ----- @@ -601,6 +615,7 @@ SELECT pg_stat_get_subscription_stats(NULL); -- - extends of relations using shared buffers -- - fsyncs done to ensure the durability of data dirtying shared buffers -- - shared buffer hits +-- - WAL writes and fsyncs in IOContext IOCONTEXT_NORMAL -- There is no test for blocks evicted from shared buffers, because we cannot -- be sure of the state of shared buffers at the point the test is run. @@ -612,6 +627,9 @@ SELECT sum(extends) AS io_sum_shared_before_extends SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs FROM pg_stat_io WHERE object = 'relation' \gset io_sum_shared_before_ +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal' \gset io_sum_wal_normal_before_ CREATE TABLE test_io_shared(a int); INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i; SELECT pg_stat_force_next_flush(); @@ -630,6 +648,13 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; SELECT current_setting('fsync') = 'off' OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs; +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'normal' and object = 'wal' \gset io_sum_wal_normal_after_ +SELECT current_setting('synchronous_commit') = 'on'; +SELECT :io_sum_wal_normal_after_writes > :io_sum_wal_normal_before_writes; +SELECT current_setting('fsync') = 'off' + OR :io_sum_wal_normal_after_fsyncs > :io_sum_wal_normal_before_fsyncs; -- Change the tablespace so that the table is rewritten directly, then SELECT -- from it to cause it to be read back into shared buffers. diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b804eb8b5ef..a721f7b3459 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2499,9 +2499,10 @@ description | Waiting for a newly initialized WAL file to reach durable storage </para> <para> - Currently, I/O on relations (e.g. tables, indexes) is tracked. However, - relation I/O which bypasses shared buffers (e.g. when moving a table from one - tablespace to another) is currently not tracked. + Currently, I/O on relations (e.g. tables, indexes) and WAL activity are + tracked. However, relation I/O which bypasses shared buffers + (e.g. when moving a table from one tablespace to another) is currently + not tracked. </para> <table id="pg-stat-io-view" xreflabel="pg_stat_io"> @@ -2554,6 +2555,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage <literal>temp relation</literal>: Temporary relations. </para> </listitem> + <listitem> + <para> + <literal>wal</literal>: Write Ahead Logs. + </para> + </listitem> </itemizedlist> </para> </entry> @@ -2578,6 +2584,13 @@ description | Waiting for a newly initialized WAL file to reach durable storage <literal>normal</literal>. </para> </listitem> + <listitem> + <para> + <literal>init</literal>: I/O operations performed while creating the + WAL segments are tracked in <varname>context</varname> + <literal>init</literal>. + </para> + </listitem> <listitem> <para> <literal>vacuum</literal>: I/O operations performed outside of shared @@ -4767,7 +4780,9 @@ description | Waiting for a newly initialized WAL file to reach durable storage <listitem> <para> <literal>io</literal>: Reset all the counters shown in the - <structname>pg_stat_io</structname> view. + <structname>pg_stat_io</structname> view. This resets the timing + counter stored in the <structname>pg_stat_wal</structname> view + too. </para> </listitem> <listitem> @@ -4784,8 +4799,10 @@ description | Waiting for a newly initialized WAL file to reach durable storage </listitem> <listitem> <para> - <literal>wal</literal>: Reset all the counters shown in the - <structname>pg_stat_wal</structname> view. + <literal>wal</literal>: Reset all the counters except timings shown + in the <structname>pg_stat_wal</structname> view. The timing + counters can be reset by calling pg_stat_reset_shared set to + <literal>io</literal>. </para> </listitem> <listitem> -- 2.43.0