Hi, On 2022-07-12 12:19:06 -0400, Melanie Plageman wrote: > > > I also realized that I am not differentiating between IOPATH_SHARED and > > > IOPATH_STRATEGY for IOOP_FSYNC. But, given that we don't know what type > > > of buffer we are fsync'ing by the time we call register_dirty_segment(), > > > I'm not sure how we would fix this. > > > > I think there scarcely happens flush for strategy-loaded buffers. If > > that is sensible, IOOP_FSYNC would not make much sense for > > IOPATH_STRATEGY. > > > > Why would it be less likely for a backend to do its own fsync when > flushing a dirty strategy buffer than a regular dirty shared buffer?
We really just don't expect a backend to do many segment fsyncs at all. Otherwise there's something wrong with the forwarding mechanism. It'd be different if we tracked WAL fsyncs more granularly - which would be quite interesting - but that's something for another day^Wpatch. > > > > Wonder if it's worth making the lock specific to the backend type? > > > > > > > > > > I've added another Lock into PgStat_IOPathOps so that each BackendType > > > can be locked separately. But, I've also kept the lock in > > > PgStatShared_BackendIOPathOps so that reset_all and snapshot could be > > > done easily. > > > > Looks fine about the lock separation. > > > > Actually, I think it is not safe to use both of these locks. So for > picking one method, it is probably better to go with the locks in > PgStat_IOPathOps, it will be more efficient for flush (and not for > fetching and resetting), so that is probably the way to go, right? I think it's good to just use one kind of lock, and efficiency of snapshotting / resetting is nearly irrelevant. But I don't see why it's not safe to use both kinds of locks? > > Looks fine, but I think pgstat_flush_io_ops() need more comments like > > other pgstat_flush_* functions. > > > > + for (int i = 0; i < BACKEND_NUM_TYPES; i++) > > + stats_shmem->stats[i].stat_reset_timestamp = ts; > > > > I'm not sure we need a separate reset timestamp for each backend type > > but SLRU counter does the same thing.. > > > > Yes, I think for SLRU stats it is because you can reset individual SLRU > stats. Also there is no wrapper data structure to put it in. I could > keep it in PgStatShared_BackendIOPathOps since you have to reset all IO > operation stats at once, but I am thinking of getting rid of > PgStatShared_BackendIOPathOps since it is not needed if I only keep the > locks in PgStat_IOPathOps and make the global shared value an array of > PgStat_IOPathOps. I'm strongly against introducing super granular reset timestamps. I think that was a mistake for SLRU stats, but we can't fix that as easily. > Currently, strategy allocs count only reuses of a strategy buffer (not > initial shared buffers which are added to the ring). > strategy writes count only the writing out of dirty buffers which are > already in the ring and are being reused. That seems right to me. > Alternatively, we could also count as strategy allocs all those buffers > which are added to the ring and count as strategy writes all those > shared buffers which are dirty when initially added to the ring. I don't think that'd provide valuable information. The whole reason that strategy writes are interesting is that they can lead to writing out data a lot sooner than they would be written out without a strategy being used. > Subject: [PATCH v24 2/3] Track IO operation statistics > > Introduce "IOOp", an IO operation done by a backend, and "IOPath", the > location or type of IO done by a backend. For example, the checkpointer > may write a shared buffer out. This would be counted as an IOOp write on > an IOPath IOPATH_SHARED by BackendType "checkpointer". I'm still not 100% happy with IOPath - seems a bit too easy to confuse with the file path. What about 'origin'? > Each IOOp (alloc, fsync, extend, write) is counted per IOPath > (direct, local, shared, or strategy) through a call to > pgstat_count_io_op(). It seems we should track reads too - it's quite interesting to know whether reads happened because of a strategy, for example. You do reference reads in a later part of the commit message even :) > The primary concern of these statistics is IO operations on data blocks > during the course of normal database operations. IO done by, for > example, the archiver or syslogger is not counted in these statistics. We could extend this at a later stage, if we really want to. But I'm not sure it's interesting or fully possible. E.g. the archiver's write are largely not done by the archiver itself, but by a command (or module these days) it shells out to. > Note that this commit does not add code to increment IOPATH_DIRECT. A > future patch adding wrappers for smgrwrite(), smgrextend(), and > smgrimmedsync() would provide a good location to call > pgstat_count_io_op() for unbuffered IO and avoid regressions for future > users of these functions. Hm. Perhaps we should defer introducing IOPATH_DIRECT for now then? > Stats on IOOps for all IOPaths for a backend are initially accumulated > locally. > > Later they are flushed to shared memory and accumulated with those from > all other backends, exited and live. Perhaps mention here that this later could be extended to make per-connection stats visible? > Some BackendTypes will not execute pgstat_report_stat() and thus must > explicitly call pgstat_flush_io_ops() in order to flush their backend > local IO operation statistics to shared memory. Maybe add "flush ... during ongoing operation" or such? Because they'd all flush at commit, IIRC. > diff --git a/src/backend/bootstrap/bootstrap.c > b/src/backend/bootstrap/bootstrap.c > index 088556ab54..963b05321e 100644 > --- a/src/backend/bootstrap/bootstrap.c > +++ b/src/backend/bootstrap/bootstrap.c > @@ -33,6 +33,7 @@ > #include "miscadmin.h" > #include "nodes/makefuncs.h" > #include "pg_getopt.h" > +#include "pgstat.h" > #include "storage/bufmgr.h" > #include "storage/bufpage.h" > #include "storage/condition_variable.h" Hm? > diff --git a/src/backend/postmaster/walwriter.c > b/src/backend/postmaster/walwriter.c > index e926f8c27c..beb46dcb55 100644 > --- a/src/backend/postmaster/walwriter.c > +++ b/src/backend/postmaster/walwriter.c > @@ -293,18 +293,7 @@ HandleWalWriterInterrupts(void) > } > > if (ShutdownRequestPending) > - { > - /* > - * Force reporting remaining WAL statistics at process exit. > - * > - * Since pgstat_report_wal is invoked with 'force' is false in > main > - * loop to avoid overloading the cumulative stats system, there > may > - * exist unreported stats counters for the WAL writer. > - */ > - pgstat_report_wal(true); > - > proc_exit(0); > - } > > /* Perform logging of memory contexts of this process */ > if (LogMemoryContextPending) Let's do this in a separate commit and get it out of the way... > @@ -682,16 +694,37 @@ AddBufferToRing(BufferAccessStrategy strategy, > BufferDesc *buf) > * if this buffer should be written and re-used. > */ > bool > -StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf) > +StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool > *write_from_ring) > { > - /* We only do this in bulkread mode */ > + > + /* > + * We only reject reusing and writing out the strategy buffer this in > + * bulkread mode. > + */ > if (strategy->btype != BAS_BULKREAD) > + { > + /* > + * If the buffer was from the ring and we are not rejecting it, > consider it > + * a write of a strategy buffer. > + */ > + if (strategy->current_was_in_ring) > + *write_from_ring = true; Hm. This is set even if the buffer wasn't dirty? I guess we don't expect StrategyRejectBuffer() to be called for clean buffers... > /* > diff --git a/src/backend/utils/activity/pgstat_database.c > b/src/backend/utils/activity/pgstat_database.c > index d9275611f0..d3963f59d0 100644 > --- a/src/backend/utils/activity/pgstat_database.c > +++ b/src/backend/utils/activity/pgstat_database.c > @@ -47,7 +47,8 @@ pgstat_drop_database(Oid databaseid) > } > > /* > - * Called from autovacuum.c to report startup of an autovacuum process. > + * Called from autovacuum.c to report startup of an autovacuum process and > + * flush IO Operation statistics. > * We are called before InitPostgres is done, so can't rely on MyDatabaseId; > * the db OID must be passed in, instead. > */ > @@ -72,6 +73,11 @@ pgstat_report_autovac(Oid dboid) > dbentry->stats.last_autovac_time = GetCurrentTimestamp(); > > pgstat_unlock_entry(entry_ref); > + > + /* > + * Report IO Operation statistics > + */ > + pgstat_flush_io_ops(false); > } Hm. I suspect this will always be zero - at this point we haven't connected to a database, so there really can't have been much, if any, IO. I think I suggested doing something here, but on a second look it really doesn't make much sense. Note that that's different from doing something in pgstat_report_(vacuum|analyze) - clearly we've done something at that point. > /* > - * Report that the table was just vacuumed. > + * Report that the table was just vacuumed and flush IO Operation statistics. > */ > void > pgstat_report_vacuum(Oid tableoid, bool shared, > @@ -257,10 +257,15 @@ pgstat_report_vacuum(Oid tableoid, bool shared, > } > > pgstat_unlock_entry(entry_ref); > + > + /* > + * Report IO Operations statistics > + */ > + pgstat_flush_io_ops(false); > } > > /* > - * Report that the table was just analyzed. > + * Report that the table was just analyzed and flush IO Operation statistics. > * > * Caller must provide new live- and dead-tuples estimates, as well as a > * flag indicating whether to reset the changes_since_analyze counter. > @@ -340,6 +345,11 @@ pgstat_report_analyze(Relation rel, > } > > pgstat_unlock_entry(entry_ref); > + > + /* > + * Report IO Operations statistics > + */ > + pgstat_flush_io_ops(false); > } Think it'd be good to amend these comments to say that otherwise stats would only get flushed after a multi-relatio autovacuum cycle is done / a VACUUM/ANALYZE command processed all tables. Perhaps add the comment to one of the two functions, and just reference it in the other place? > --- a/src/include/utils/backend_status.h > +++ b/src/include/utils/backend_status.h > @@ -306,6 +306,40 @@ extern const char > *pgstat_get_crashed_backend_activity(int pid, char *buffer, > > int buflen); > extern uint64 pgstat_get_my_query_id(void); > > +/* Utility functions */ > + > +/* > + * When maintaining an array of information about all valid BackendTypes, in > + * order to avoid wasting the 0th spot, use this helper to convert a valid > + * BackendType to a valid location in the array (given that no spot is > + * maintained for B_INVALID BackendType). > + */ > +static inline int backend_type_get_idx(BackendType backend_type) > +{ > + /* > + * backend_type must be one of the valid backend types. If caller is > + * maintaining backend information in an array that includes B_INVALID, > + * this function is unnecessary. > + */ > + Assert(backend_type > B_INVALID && backend_type <= BACKEND_NUM_TYPES); > + return backend_type - 1; > +} In function definitions (vs declarations) we put the 'static inline int' in a separate line from the rest of the function signature. > +/* > + * When using a value from an array of information about all valid > + * BackendTypes, add 1 to the index before using it as a BackendType to > adjust > + * for not maintaining a spot for B_INVALID BackendType. > + */ > +static inline BackendType idx_get_backend_type(int idx) > +{ > + int backend_type = idx + 1; > + /* > + * If the array includes a spot for B_INVALID BackendType this function > is > + * not required. The comments around this seem a bit over the top, but I also don't mind them much. > Add pg_stat_io, a system view which tracks the number of IOOp (allocs, > writes, fsyncs, and extends) done through each IOPath (e.g. shared > buffers, local buffers, unbuffered IO) by each type of backend. Annoying question: pg_stat_io vs pg_statio? I'd not think of suggesting the latter, except that we already have a bunch of views with that prefix. > Some of these should always be zero. For example, checkpointer does not > use a BufferAccessStrategy (currently), so the "strategy" IOPath for > checkpointer will be 0 for all IOOps. What do you think about returning NULL for the values that we except to never be non-zero? Perhaps with an assert against non-zero values? Seems like it might be helpful for understanding the view. > +/* > +* When adding a new column to the pg_stat_io view, add a new enum > +* value here above IO_NUM_COLUMNS. > +*/ > +enum > +{ > + IO_COLUMN_BACKEND_TYPE, > + IO_COLUMN_IO_PATH, > + IO_COLUMN_ALLOCS, > + IO_COLUMN_EXTENDS, > + IO_COLUMN_FSYNCS, > + IO_COLUMN_WRITES, > + IO_COLUMN_RESET_TIME, > + IO_NUM_COLUMNS, > +}; We typedef pretty much every enum so the enum can be referenced without the 'enum' prefix. I'd do that here, even if we don't need it. Greetings, Andres Freund