On Thu, Apr 15, 2021 at 7:59 PM Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2021-04-12 19:49:36 -0700, Melanie Plageman wrote:
> > So, I took a stab at implementing this in PgBackendStatus.
>
> Cool!
>

Just a note on v2 of the patch -- the diff for the changes I made to
pgstatfuncs.c is pretty atrocious and hard to read. I tried using a
different diff algorithm, to no avail.


>
>
> > The attached patch is not quite on top of current master, so, alas,
> > don't try and apply it. I went to rebase today and realized I needed
> > to make some changes in light of e1025044cd4, however, I wanted to
> > share this WIP so that I could pose a few questions that I imagine
> > will still be relevant after I rewrite the patch.
>

Regarding the refactor done in e1025044cd4:
Most of the functions I've added access variables in PgBackendStatus, so
I put most of them in backend_status.h/c. However, technically, these
are stats which are aggregated over time, which e1025044cd4 says should
go in pgstat.c/h. I could move some of it, but I hadn't tried to do so,
as it made a few things inconvenient, and, I wasn't sure if it was the
right thing to do anyway.


> >
> > I removed buffers_backend and buffers_backend_fsync from
> > pg_stat_bgwriter and have created a new view which tracks
> >   - number of shared buffers the checkpointer and bgwriter write out
> >   - number of shared buffers a regular backend is forced to flush
> >   - number of extends done by a regular backend through shared buffers
> >   - number of buffers flushed by a backend or autovacuum using a
> >     BufferAccessStrategy which, were they not to use this strategy,
> >     could perhaps have been avoided if a clean shared buffer was
> >     available
> >   - number of fsyncs done by a backend which could have been done by
> >     checkpointer if sync queue had not been full
>
> I wonder if leaving buffers_alloc in pg_stat_bgwriter makes sense after
> this? I'm tempted to move that to pg_stat_buffers or such...
>
>
I've gone ahead and moved buffers_alloc out of pg_stat_bgwriter and into
pg_stat_buffer_actions (I've renamed it from pg_stat_buffers_written).


> I'm not quite convinced by having separate columns for checkpointer,
> bgwriter, etc. That doesn't seem to scale all that well. What if we
> instead made it a view that has one row for each BackendType?
>
>
I've changed the view to have one row for each backend type for which we
would like to report stats and one column for each buffer action type.

To make the code easier to write, I record buffer actions for all
backend types -- even if we don't have any buffer actions we care about
for that backend type. I thought it was okay because when I actually
aggregate the counters across backends, I only do so for the backend
types we care about -- thus there shouldn't be much accessing of shared
memory by multiple different processes.

Also, I copy-pasted most of the code in pg_stat_get_buffer_actions() to
set up the result tuplestore from pg_stat_get_activity() without totally
understanding all the parts of it, so I'm not sure if all of it is
required here.


>
> > In implementing this counting per backend, it is easy for all types of
> > backends to keep track of the number of writes, extends, fsyncs, and
> > strategy writes they are doing. So, as recommended upthread, I have
> > added columns in the view for the number of writes for checkpointer and
> > bgwriter and others. Thus, this view becomes more than just stats on
> > "avoidable I/O done by backends".
> >
> > So, my question is, does it makes sense to track all extends -- those to
> > extend the fsm and visimap and when making a new relation or index? Is
> > that information useful? If so, is it different than the extends done
> > through shared buffers? Should it be tracked separately?
>
> I don't fully understand what you mean with "extends done through shared
> buffers"?
>
>
By "extends done through shared buffers", I just mean when an extend of
a relation is done and the data that will be written to the new block is
written into a shared buffer (as opposed to a local one or local memory
or a strategy buffer).

Random note:
I added a length member to the BackendType enum (BACKEND_NUM_TYPES),
which led to this compiler warning:

    miscinit.c: In function ‘GetBackendTypeDesc’:
    miscinit.c:236:2: warning: enumeration value ‘BACKEND_NUM_TYPES’ not
handled in switch [-Wswitch]
      236 |  switch (backendType)
          |  ^~~~~~

I tried using pg_attribute_unused() for BACKEND_NUM_TYPES, but, it
didn't seem to have the desired effect. As such, I just threw a case
into GetBackendTypeDesc() which does nothing (as opposed to erroring
out), since the backendDesc already is initialized to "unknown process
type", erroring out doesn't seem to be expected.

- Melanie
From 8bfacd947641554c6f1e8a24df4950e915c2c264 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 4 Jun 2021 17:07:48 -0400
Subject: [PATCH v2] Add system view tracking shared buffer actions

Add a system view which tracks
- number of shared buffers the checkpointer and bgwriter write out
- number of shared buffers a regular backend is forced to flush
- number of extends done by a regular backend through shared buffers
- number of buffers flushed by a backend or autovacuum using a
  BufferAccessStrategy which, were they not to use this strategy, could
  perhaps have been avoided if a clean shared buffer was available
- number of fsyncs done by a backend which could have been done by
  checkpointer if sync queue had not been full
- number of buffers allocated by a regular backend or autovacuum worker
  for either a new block or an existing block of a relation which is not
  currently in a buffer

All of these stats which were in the system view pg_stat_bgwriter have
been removed from that view.

All backends, on exit, will update a shared memory array with the
buffers they wrote or extended.

When the view is queried, add all live backend's statuses
to the totals in the shared memory array and return that as the full
total.

Each row of the view is for a particular backend type and each column is
the number of a particular kind of buffer action taken by the various
backends.

TODO:
- Some kind of test?
- Docs change
---
 src/backend/catalog/system_views.sql        |  14 +-
 src/backend/postmaster/checkpointer.c       |  27 +---
 src/backend/postmaster/pgstat.c             |   3 -
 src/backend/storage/buffer/bufmgr.c         |  73 +++++++++-
 src/backend/storage/buffer/freelist.c       |  37 +++--
 src/backend/storage/ipc/ipci.c              |   1 +
 src/backend/utils/activity/backend_status.c | 144 ++++++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c         |  53 +++++--
 src/backend/utils/init/miscinit.c           |   2 +
 src/include/catalog/pg_proc.dat             |  21 ++-
 src/include/miscadmin.h                     |  12 ++
 src/include/pgstat.h                        |   6 -
 src/include/storage/buf_internals.h         |   3 +
 src/include/utils/backend_status.h          |   9 +-
 src/test/regress/expected/rules.out         |  10 +-
 src/test/regress/sql/stats.sql              |   1 +
 16 files changed, 341 insertions(+), 75 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 999d984068..e77ed42352 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1067,9 +1067,6 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
         pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
         pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-        pg_stat_get_buf_written_backend() AS buffers_backend,
-        pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-        pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
 CREATE VIEW pg_stat_wal AS
@@ -1085,6 +1082,17 @@ CREATE VIEW pg_stat_wal AS
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
+CREATE VIEW pg_stat_buffer_actions AS
+SELECT
+       b.backend_type,
+       b.buffers_alloc,
+       b.buffers_extend,
+       b.buffers_fsync,
+       b.buffers_write,
+       b.buffers_write_strat
+FROM pg_stat_get_buffer_actions() b;
+
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 75a95f3de7..8eecd35965 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -93,17 +93,8 @@
  * requesting backends since the last checkpoint start.  The flags are
  * chosen so that OR'ing is the correct way to combine multiple requests.
  *
- * num_backend_writes is used to count the number of buffer writes performed
- * by user backend processes.  This counter should be wide enough that it
- * can't overflow during a single processing cycle.  num_backend_fsync
- * counts the subset of those writes that also had to do their own fsync,
- * because the checkpointer failed to absorb their request.
- *
  * The requests array holds fsync requests sent by backends and not yet
  * absorbed by the checkpointer.
- *
- * Unlike the checkpoint fields, num_backend_writes, num_backend_fsync, and
- * the requests fields are protected by CheckpointerCommLock.
  *----------
  */
 typedef struct
@@ -127,9 +118,6 @@ typedef struct
 	ConditionVariable start_cv; /* signaled when ckpt_started advances */
 	ConditionVariable done_cv;	/* signaled when ckpt_done advances */
 
-	uint32		num_backend_writes; /* counts user backend buffer writes */
-	uint32		num_backend_fsync;	/* counts user backend fsync calls */
-
 	int			num_requests;	/* current # of requests */
 	int			max_requests;	/* allocated array size */
 	CheckpointerRequest requests[FLEXIBLE_ARRAY_MEMBER];
@@ -1092,10 +1080,6 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 
 	LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
-	/* Count all backend writes regardless of if they fit in the queue */
-	if (!AmBackgroundWriterProcess())
-		CheckpointerShmem->num_backend_writes++;
-
 	/*
 	 * If the checkpointer isn't running or the request queue is full, the
 	 * backend will have to perform its own fsync request.  But before forcing
@@ -1109,8 +1093,10 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 		 * Count the subset of writes where backends have to do their own
 		 * fsync
 		 */
+		/* TODO: should we count fsyncs for all types of procs? */
 		if (!AmBackgroundWriterProcess())
-			CheckpointerShmem->num_backend_fsync++;
+			pgstat_increment_buffer_action(BA_Fsync);
+
 		LWLockRelease(CheckpointerCommLock);
 		return false;
 	}
@@ -1267,13 +1253,6 @@ AbsorbSyncRequests(void)
 
 	LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
-	/* Transfer stats counts into pending pgstats message */
-	BgWriterStats.m_buf_written_backend += CheckpointerShmem->num_backend_writes;
-	BgWriterStats.m_buf_fsync_backend += CheckpointerShmem->num_backend_fsync;
-
-	CheckpointerShmem->num_backend_writes = 0;
-	CheckpointerShmem->num_backend_fsync = 0;
-
 	/*
 	 * We try to avoid holding the lock for a long time by copying the request
 	 * array, and processing the requests after releasing the lock.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b0d07c0e0b..7396202da2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -5352,9 +5352,6 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_written_checkpoints += msg->m_buf_written_checkpoints;
 	globalStats.buf_written_clean += msg->m_buf_written_clean;
 	globalStats.maxwritten_clean += msg->m_maxwritten_clean;
-	globalStats.buf_written_backend += msg->m_buf_written_backend;
-	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
-	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
 /* ----------
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4b296a22c4..4c753c1e02 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -969,6 +969,11 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	if (isExtend)
 	{
+		/*
+		 * Extends counted here are only those that go through shared buffers
+		 */
+		pgstat_increment_buffer_action(BA_Extend);
+
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
 		/* don't set checksum for all-zero page */
@@ -1235,11 +1240,60 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					if (XLogNeedsFlush(lsn) &&
 						StrategyRejectBuffer(strategy, buf))
 					{
+						/*
+						 * Unset the strat write flag, as we will not be writing
+						 * this particular buffer from our ring out and may end
+						 * up having to find a buffer from main shared buffers,
+						 * which, if it is dirty, we may have to write out, which
+						 * could have been prevented by checkpointing and background
+						 * writing
+						 */
+						StrategyUnChooseBufferFromRing(strategy);
+
 						/* Drop lock/pin and loop around for another buffer */
 						LWLockRelease(BufferDescriptorGetContentLock(buf));
 						UnpinBuffer(buf, true);
 						continue;
 					}
+
+					/*
+					 * TODO: there is certainly a better way to write this
+					 * logic
+					 */
+
+					/*
+					 * The dirty buffer that will be written out was selected
+					 * from the ring and we did not bother checking the
+					 * freelist or doing a clock sweep to look for a clean
+					 * buffer to use, thus, this write will be counted as a
+					 * strategy write -- one that may be unnecessary without a
+					 * strategy
+					 */
+					if (StrategyIsBufferFromRing(strategy))
+					{
+						pgstat_increment_buffer_action(BA_Write_Strat);
+					}
+
+						/*
+						 * If the dirty buffer was one we grabbed from the
+						 * freelist or through a clock sweep, it could have been
+						 * written out by bgwriter or checkpointer, thus, we will
+						 * count it as a regular write
+						 */
+					else
+						pgstat_increment_buffer_action(BA_Write);
+				}
+				else
+				{
+					/*
+					 * If strategy is NULL, we could only be doing a write.
+					 * Extend operations will be counted in smgrextend. That
+					 * is separate I/O than any flushing of dirty buffers. If
+					 * we add more Backend Access Types, perhaps we will need
+					 * additional checks here
+					 */
+					pgstat_increment_buffer_action(BA_Write);
+
 				}
 
 				/* OK, do the I/O */
@@ -2252,9 +2306,6 @@ BgBufferSync(WritebackContext *wb_context)
 	 */
 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
 
-	/* Report buffer alloc counts to pgstat */
-	BgWriterStats.m_buf_alloc += recent_alloc;
-
 	/*
 	 * If we're not running the LRU scan, just stop after doing the stats
 	 * stuff.  We mark the saved state invalid so that we can recover sanely
@@ -2549,6 +2600,8 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	 * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
 	 * buffer is clean by the time we've locked it.)
 	 */
+	pgstat_increment_buffer_action(BA_Write);
+
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
@@ -2901,6 +2954,20 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
 	 */
+
+	/*
+	 * TODO: consider that if we did not need to distinguish between a buffer
+	 * flushed that was grabbed from the ring buffer and written out as part
+	 * of a strategy which was not from main Shared Buffers (and thus
+	 * preventable by bgwriter or checkpointer), then we could move all calls
+	 * to pgstat_increment_buffer_action() here except for the one for
+	 * extends, which would remain in ReadBuffer_common() before smgrextend()
+	 * (unless we decide to start counting other extends). That includes the
+	 * call to count buffers written by bgwriter and checkpointer which go
+	 * through FlushBuffer() but not BufferAlloc(). That would make it
+	 * simpler. Perhaps instead we can find somewhere else to indicate that
+	 * the buffer is from the ring of buffers to reuse.
+	 */
 	smgrwrite(reln,
 			  buf->tag.forkNum,
 			  buf->tag.blockNum,
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 6be80476db..523b024992 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -19,6 +19,7 @@
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "storage/proc.h"
+#include "utils/backend_status.h"
 
 #define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)&(var))))
 
@@ -51,7 +52,6 @@ typedef struct
 	 * overflow during a single bgwriter cycle.
 	 */
 	uint32		completePasses; /* Complete cycles of the clock sweep */
-	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */
 
 	/*
 	 * Bgworker process to be notified upon activity or -1 if none. See
@@ -86,6 +86,13 @@ typedef struct BufferAccessStrategyData
 	 * ring already.
 	 */
 	bool		current_was_in_ring;
+	/*
+	 * If we could chose a buffer from this list and we end up having to write
+	 * it out because it is dirty when we actually could have found a clean
+	 * buffer in either the freelist or through doing a clock sweep of shared
+	 * buffers, this flag will indicate that
+	 */
+	bool		chose_buffer_in_ring;
 
 	/*
 	 * Array of buffer numbers.  InvalidBuffer (that is, zero) indicates we
@@ -213,7 +220,10 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
 		if (buf != NULL)
+		{
+			StrategyChooseBufferBufferFromRing(strategy);
 			return buf;
+		}
 	}
 
 	/*
@@ -247,7 +257,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	 * the rate of buffer consumption.  Note that buffers recycled by a
 	 * strategy object are intentionally not counted here.
 	 */
-	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
+	pgstat_increment_buffer_action(BA_Alloc);
 
 	/*
 	 * First check, without acquiring the lock, whether there's buffers in the
@@ -411,11 +421,6 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
 		 */
 		*complete_passes += nextVictimBuffer / NBuffers;
 	}
-
-	if (num_buf_alloc)
-	{
-		*num_buf_alloc = pg_atomic_exchange_u32(&StrategyControl->numBufferAllocs, 0);
-	}
 	SpinLockRelease(&StrategyControl->buffer_strategy_lock);
 	return result;
 }
@@ -517,7 +522,6 @@ StrategyInitialize(bool init)
 
 		/* Clear statistics */
 		StrategyControl->completePasses = 0;
-		pg_atomic_init_u32(&StrategyControl->numBufferAllocs, 0);
 
 		/* No pending notification */
 		StrategyControl->bgwprocno = -1;
@@ -702,3 +706,20 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
 
 	return true;
 }
+void
+StrategyUnChooseBufferFromRing(BufferAccessStrategy strategy)
+{
+	strategy->chose_buffer_in_ring = false;
+}
+
+void
+StrategyChooseBufferBufferFromRing(BufferAccessStrategy strategy)
+{
+	strategy->chose_buffer_in_ring = true;
+}
+
+bool
+StrategyIsBufferFromRing(BufferAccessStrategy strategy)
+{
+	return strategy->chose_buffer_in_ring;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..c662853423 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -240,6 +240,7 @@ CreateSharedMemoryAndSemaphores(void)
 		InitProcGlobal();
 	CreateSharedProcArray();
 	CreateSharedBackendStatus();
+	CreateBufferActionStatsCounters();
 	TwoPhaseShmemInit();
 	BackgroundWorkerShmemInit();
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 2901f9f5a9..97dac2d41f 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -55,6 +55,7 @@ static char *BackendAppnameBuffer = NULL;
 static char *BackendClientHostnameBuffer = NULL;
 static char *BackendActivityBuffer = NULL;
 static Size BackendActivityBufferSize = 0;
+static int *BufferActionStatsArray    = NULL;
 #ifdef USE_SSL
 static PgBackendSSLStatus *BackendSslStatusBuffer = NULL;
 #endif
@@ -75,6 +76,7 @@ static MemoryContext backendStatusSnapContext;
 static void pgstat_beshutdown_hook(int code, Datum arg);
 static void pgstat_read_current_status(void);
 static void pgstat_setup_backend_status_context(void);
+static void pgstat_record_dead_backend_buffer_actions(void);
 
 
 /*
@@ -236,6 +238,22 @@ CreateSharedBackendStatus(void)
 #endif
 }
 
+void
+CreateBufferActionStatsCounters(void)
+{
+	bool		found;
+	Size		size = 0;
+	int length;
+
+	length = BACKEND_NUM_TYPES * BUFFER_ACTION_NUM_TYPES;
+	size                   = mul_size(sizeof(int), length);
+	BufferActionStatsArray = (int *)
+		ShmemInitStruct("Buffer actions taken by each backend type", size, &found);
+	if (!found)
+		MemSet(BufferActionStatsArray, 0, size);
+}
+
+
 /*
  * Initialize pgstats backend activity state, and set up our on-proc-exit
  * hook.  Called from InitPostgres and AuxiliaryProcessMain. For auxiliary
@@ -399,6 +417,11 @@ pgstat_bestart(void)
 	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
 	lbeentry.st_progress_command_target = InvalidOid;
 	lbeentry.st_query_id = UINT64CONST(0);
+	lbeentry.num_allocs = 0;
+	lbeentry.num_extends = 0;
+	lbeentry.num_fsyncs = 0;
+	lbeentry.num_writes = 0;
+	lbeentry.num_writes_strat = 0;
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
@@ -469,6 +492,16 @@ pgstat_beshutdown_hook(int code, Datum arg)
 	beentry->st_procpid = 0;	/* mark invalid */
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	/*
+	 * Because the stats tracking shared buffers written and extended do not
+	 * go through the stats collector, it didn't make sense to add them to
+	 * pgstat_report_stat() At least the DatabaseId should be valid. Otherwise
+	 * we can't be sure that the members were zero-initialized (TODO: is that
+	 * true?)
+	 */
+	if (OidIsValid(MyDatabaseId))
+		pgstat_record_dead_backend_buffer_actions();
 }
 
 /*
@@ -1041,6 +1074,117 @@ pgstat_get_my_query_id(void)
 	 */
 	return MyBEEntry->st_query_id;
 }
+void
+pgstat_increment_buffer_action(BufferActionType ba_type)
+{
+	volatile PgBackendStatus *beentry   = MyBEEntry;
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+	if (ba_type == BA_Alloc)
+		beentry->num_allocs++;
+	else if (ba_type == BA_Extend)
+		beentry->num_extends++;
+	else if (ba_type == BA_Fsync)
+		beentry->num_fsyncs++;
+	else if (ba_type == BA_Write)
+		beentry->num_writes++;
+	else if (ba_type == BA_Write_Strat)
+		beentry->num_writes_strat++;
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/*
+ * Called for a single backend at the time of death to persist its I/O stats
+ */
+void
+pgstat_record_dead_backend_buffer_actions(void)
+{
+	BackendType bt;
+	volatile	PgBackendStatus *beentry = MyBEEntry;
+
+	if (beentry->st_procpid != 0)
+		return;
+	bt = beentry->st_backendType;
+
+	for (;;)
+	{
+		int			before_changecount;
+		int			after_changecount;
+
+		pgstat_begin_read_activity(beentry, before_changecount);
+		BufferActionStatsArray[(bt * BUFFER_ACTION_NUM_TYPES) + BA_Alloc] += beentry->num_allocs;
+		BufferActionStatsArray[(bt * BUFFER_ACTION_NUM_TYPES) + BA_Extend] += beentry->num_extends;
+		BufferActionStatsArray[(bt * BUFFER_ACTION_NUM_TYPES) + BA_Fsync] += beentry->num_fsyncs;
+		BufferActionStatsArray[(bt * BUFFER_ACTION_NUM_TYPES) + BA_Write] += beentry->num_writes;
+		BufferActionStatsArray[(bt * BUFFER_ACTION_NUM_TYPES) + BA_Write_Strat] += beentry->num_writes_strat;
+		pgstat_end_read_activity(beentry, after_changecount);
+
+		if (pgstat_read_activity_complete(before_changecount, after_changecount))
+			break;
+
+		/* Make sure we can break out of loop if stuck... */
+		CHECK_FOR_INTERRUPTS();
+	}
+}
+
+/*
+ * Fill the provided values array with the accumulated counts of buffer actions
+ * taken by all backends of type backend_type (input parameter), both alive and
+ * dead. This is currently only used by pg_stat_get_buffer_actions() to create
+ * the rows in the pg_stat_buffer_actions system view.
+ */
+void
+pgstat_recount_all_buffer_actions(BackendType backend_type, Datum * values)
+{
+	int			beid;
+	int			tot_backends = pgstat_fetch_stat_numbackends();
+
+	/*
+	 * Add stats from all exited backends
+	 *
+	 * TODO: I thought maybe it is okay to just access this lock-free since it
+	 * is only written to when a process dies in
+	 * pgstat_record_dead_backend_buffer_actions() and is read at the time of
+	 * querying the view with the stats. It's okay if we don't have 100%
+	 * up-to-date stats. However, I was wondering about torn values and
+	 * platforms without 64bit "single copy atomicity"
+	 *
+	 * Because the values array is datums and
+	 * BufferActionStatsArray is int64s, can't do a simple memcpy
+	 *
+	 */
+	values[BA_Alloc] = BufferActionStatsArray[(backend_type * BUFFER_ACTION_NUM_TYPES) + BA_Alloc];
+	values[BA_Extend] = BufferActionStatsArray[(backend_type * BUFFER_ACTION_NUM_TYPES) + BA_Extend];
+	values[BA_Fsync] = BufferActionStatsArray[(backend_type * BUFFER_ACTION_NUM_TYPES) + BA_Fsync];
+	values[BA_Write] = BufferActionStatsArray[(backend_type * BUFFER_ACTION_NUM_TYPES) + BA_Write];
+	values[BA_Write_Strat] = BufferActionStatsArray[(backend_type * BUFFER_ACTION_NUM_TYPES) + BA_Write_Strat];
+
+	/*
+	 * Loop through all live backends and count their buffer actions
+	 */
+	// TODO: is there a more efficient to do this, since we will potentially loop
+	//  through all backends for each backend type
+	for (beid = 1; beid <= tot_backends; beid++)
+	{
+		BackendType bt;
+		PgBackendStatus *beentry = pgstat_fetch_stat_beentry(beid);
+
+		if (beentry->st_procpid == 0)
+			continue;
+		bt = beentry->st_backendType;
+		if (bt != backend_type)
+			continue;
+
+		values[BA_Alloc] += beentry->num_allocs;
+		values[BA_Extend] += beentry->num_extends;
+		values[BA_Fsync] += beentry->num_fsyncs;
+		values[BA_Write] += beentry->num_writes;
+		values[BA_Write_Strat] += beentry->num_writes_strat;
+	}
+}
 
 
 /* ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 14056f5347..e0dade2c52 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1780,21 +1780,50 @@ pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
 }
 
 Datum
-pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS)
+pg_stat_get_buffer_actions(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_INT64(pgstat_fetch_global()->buf_written_backend);
-}
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
 
-Datum
-pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_global()->buf_fsync_backend);
-}
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
 
-Datum
-pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+	for (size_t i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		/*
+		 * Currently, the only supported backend types for stats are the following.
+		 * If this were to change, pg_proc.dat would need to be changed as well
+		 * to reflect the new expected number of rows.
+		 */
+		Datum values[BUFFER_ACTION_NUM_TYPES];
+		bool nulls[BUFFER_ACTION_NUM_TYPES];
+		if (!(i == B_BG_WRITER || i == B_CHECKPOINTER || i == B_AUTOVAC_WORKER || i == B_BACKEND))
+			continue;
+
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = CStringGetTextDatum(GetBackendTypeDesc(i));
+		pgstat_recount_all_buffer_actions(i, values);
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
 }
 
 /*
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 8b73850d0d..d0923407ff 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -277,6 +277,8 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case BACKEND_NUM_TYPES:
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae4607..7fc3711d52 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5561,18 +5561,15 @@
   proname => 'pg_stat_get_checkpoint_sync_time', provolatile => 's',
   proparallel => 'r', prorettype => 'float8', proargtypes => '',
   prosrc => 'pg_stat_get_checkpoint_sync_time' },
-{ oid => '2775', descr => 'statistics: number of buffers written by backends',
-  proname => 'pg_stat_get_buf_written_backend', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_buf_written_backend' },
-{ oid => '3063',
-  descr => 'statistics: number of backend buffer writes that did their own fsync',
-  proname => 'pg_stat_get_buf_fsync_backend', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_buf_fsync_backend' },
-{ oid => '2859', descr => 'statistics: number of buffer allocations',
-  proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
-  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
+
+  { oid => '8459', descr => 'statistics: counts of buffer actions taken by each backend type',
+  proname => 'pg_stat_get_buffer_actions', provolatile => 's', proisstrict => 'f',
+  prorows => '4', proretset => 't',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,int8,int8,int8,int8,int8}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{backend_type,buffers_alloc,buffers_extend,buffers_fsync,buffers_write,buffers_write_strat}',
+  prosrc => 'pg_stat_get_buffer_actions' },
 
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 4dc343cbc5..b31369bd7d 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -336,8 +336,20 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	BACKEND_NUM_TYPES,
 } BackendType;
 
+typedef enum BufferActionType
+{
+	BA_Invalid = 0,
+	BA_Alloc,
+	BA_Extend,
+	BA_Fsync,
+	BA_Write,
+	BA_Write_Strat,
+	BUFFER_ACTION_NUM_TYPES,
+}			BufferActionType;
+
 extern BackendType MyBackendType;
 
 extern const char *GetBackendTypeDesc(BackendType backendType);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9612c0a6c2..9d0c2a5e1f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -475,9 +475,6 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_buf_written_checkpoints;
 	PgStat_Counter m_buf_written_clean;
 	PgStat_Counter m_maxwritten_clean;
-	PgStat_Counter m_buf_written_backend;
-	PgStat_Counter m_buf_fsync_backend;
-	PgStat_Counter m_buf_alloc;
 	PgStat_Counter m_checkpoint_write_time; /* times in milliseconds */
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
@@ -854,9 +851,6 @@ typedef struct PgStat_GlobalStats
 	PgStat_Counter buf_written_checkpoints;
 	PgStat_Counter buf_written_clean;
 	PgStat_Counter maxwritten_clean;
-	PgStat_Counter buf_written_backend;
-	PgStat_Counter buf_fsync_backend;
-	PgStat_Counter buf_alloc;
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 33fcaf5c9a..2bec2cee45 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -314,6 +314,9 @@ extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
 extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 								 BufferDesc *buf);
+extern void StrategyUnChooseBufferFromRing(BufferAccessStrategy strategy);
+extern void StrategyChooseBufferBufferFromRing(BufferAccessStrategy strategy);
+extern bool StrategyIsBufferFromRing(BufferAccessStrategy strategy);
 
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern void StrategyNotifyBgWriter(int bgwprocno);
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 8042b817df..4b35b3fe15 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -168,6 +168,11 @@ typedef struct PgBackendStatus
 
 	/* query identifier, optionally computed using post_parse_analyze_hook */
 	uint64		st_query_id;
+	int num_allocs;
+	int num_extends;
+	int num_fsyncs;
+	int num_writes;
+	int num_writes_strat;
 } PgBackendStatus;
 
 
@@ -282,7 +287,7 @@ extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
  */
 extern Size BackendStatusShmemSize(void);
 extern void CreateSharedBackendStatus(void);
-
+extern void CreateBufferActionStatsCounters(void);
 
 /* ----------
  * Functions called from backends
@@ -305,6 +310,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
 extern uint64 pgstat_get_my_query_id(void);
+extern void pgstat_increment_buffer_action(BufferActionType ba_type);
+extern void pgstat_recount_all_buffer_actions(BackendType backend_type, Datum *values);
 
 
 /* ----------
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e5ab11275d..609ccf3b7b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1824,10 +1824,14 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
     pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
     pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
     pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-    pg_stat_get_buf_written_backend() AS buffers_backend,
-    pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-    pg_stat_get_buf_alloc() AS buffers_alloc,
     pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+pg_stat_buffer_actions| SELECT b.backend_type,
+    b.buffers_alloc,
+    b.buffers_extend,
+    b.buffers_fsync,
+    b.buffers_write,
+    b.buffers_write_strat
+   FROM pg_stat_get_buffer_actions() b(backend_type, buffers_alloc, buffers_extend, buffers_fsync, buffers_write, buffers_write_strat);
 pg_stat_database| SELECT d.oid AS datid,
     d.datname,
         CASE
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index feaaee6326..fb4b613d4b 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -176,4 +176,5 @@ FROM prevstats AS pr;
 
 DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
 DROP TABLE prevstats;
+SELECT * FROM pg_stat_buffer_actions;
 -- End of Stats Test
-- 
2.27.0

Reply via email to