On Fri, Jun 4, 2021 at 5:52 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
>
> On 2021-Apr-12, Melanie Plageman wrote:
>
> > As for the way I have recorded strategy writes -- it is quite inelegant,
> > but, I wanted to make sure that I only counted a strategy write as one
> > in which the backend wrote out the dirty buffer from its strategy ring
> > but did not check if there was any clean buffer in shared buffers more
> > generally (so, it is *potentially* an avoidable write). I'm not sure if
> > this distinction is useful to anyone. I haven't done enough with
> > BufferAccessStrategies to know what I'd want to know about them when
> > developing or using Postgres. However, if I don't need to be so careful,
> > it will make the code much simpler (though, I'm sure I can improve the
> > code regardless).
>
> I was bitten last year by REFRESH MATERIALIZED VIEW counting its writes
> via buffers_backend, and I was very surprised/confused about it.  So it
> seems definitely worthwhile to count writes via strategy separately.
> For a DBA tuning the server configuration it is very useful.
>
> The main thing is to *not* let these writes end up regular
> buffers_backend (or whatever you call these now).  I didn't read your
> patch, but the way you have described it seems okay to me.
>

Thanks for the feedback!

I agree it makes sense to count strategy writes separately.

I thought about this some more, and I don't know if it makes sense to
only count "avoidable" strategy writes.

This would mean that a backend writing out a buffer from the strategy
ring when no clean shared buffers (as well as no clean strategy buffers)
are available would not count that write as a strategy write (even
though it is writing out a buffer from its strategy ring). But, it
obviously doesn't make sense to count it as a regular buffer being
written out. So, I plan to change this code.

On another note, I've updated the patch with more correct concurrency
control control mechanisms (had some data races and other problems
before). Now, I am using atomics for the buffer action counters, though
the code includes several #TODO questions around the correctness of what
I have now too.

I also wrapped the buffer action types in a struct to make them easier
to work with.

The most substantial missing piece of the patch right now is persisting
the data across reboots.

The two places in the code I can see to persist the buffer action stats
data are:
1) using the stats collector code (like in
pgstat_read/write_statsfiles()
2) using a before_shmem_exit() hook which writes the data structure to a
file and then read from it when making the shared memory array initially

It feels a bit weird to me to wedge the buffer actions stats into the
stats collector code--since the stats collector isn't receiving and
aggregating the buffer action stats.

Also, I'm unsure how writing the buffer action stats out in
pgstat_write_statsfiles() will work, since I think that backends can
update their buffer action stats after we would have already persisted
the data from the BufferActionStatsArray -- causing us to lose those
updates.

And, I don't think I can use pgstat_read_statsfiles() since the
BufferActionStatsArray should have the data from the file as soon as the
view containing the buffer action stats can be queried. Thus, it seems
like I would need to read the file while initializing the array in
CreateBufferActionStatsCounters().

I am registering the patch for September commitfest but plan to update
the stats persistence before then (and docs, etc).

-- Melanie
From 2753bf0dc3ff54a515bc0729b51ef56b6715a703 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 2 Aug 2021 17:56:07 -0400
Subject: [PATCH v3] Add system view tracking shared buffer actions

Add a system view which tracks
- number of shared buffers the checkpointer and bgwriter write out
- number of shared buffers a regular backend is forced to flush
- number of extends done by a regular backend through shared buffers
- number of buffers flushed by a backend or autovacuum using a
  BufferAccessStrategy which, were they not to use this strategy, could
  perhaps have been avoided if a clean shared buffer was available
- number of fsyncs done by a backend which could have been done by
  checkpointer if sync queue had not been full
- number of buffers allocated by a regular backend or autovacuum worker
  for either a new block or an existing block of a relation which is not
  currently in a buffer

All of these stats which were in the system view pg_stat_bgwriter have
been removed from that view.

All backends, on exit, will update a shared memory array with the
buffers they wrote or extended.

When the view is queried, add all live backend's statuses
to the totals in the shared memory array and return that as the full
total.

Each row of the view is for a particular backend type and each column is
the number of a particular kind of buffer action taken by the various
backends.

TODO:
- Some kind of test?
- Docs change
---
 src/backend/catalog/system_views.sql        |  14 ++-
 src/backend/postmaster/checkpointer.c       |  27 +---
 src/backend/postmaster/pgstat.c             |   3 -
 src/backend/storage/buffer/bufmgr.c         |  73 ++++++++++-
 src/backend/storage/buffer/freelist.c       |  37 ++++--
 src/backend/storage/ipc/ipci.c              |   1 +
 src/backend/utils/activity/backend_status.c | 131 ++++++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c         |  55 ++++++--
 src/backend/utils/init/miscinit.c           |   2 +
 src/include/catalog/pg_proc.dat             |  21 ++--
 src/include/miscadmin.h                     |  12 ++
 src/include/pgstat.h                        |   6 -
 src/include/storage/buf_internals.h         |   3 +
 src/include/utils/backend_status.h          |  16 ++-
 src/test/regress/expected/rules.out         |  10 +-
 src/test/regress/sql/stats.sql              |   1 +
 16 files changed, 337 insertions(+), 75 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f6e3711d..96cac0a74e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1067,9 +1067,6 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
         pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
         pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-        pg_stat_get_buf_written_backend() AS buffers_backend,
-        pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-        pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
 CREATE VIEW pg_stat_wal AS
@@ -1085,6 +1082,17 @@ CREATE VIEW pg_stat_wal AS
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
+CREATE VIEW pg_stat_buffer_actions AS
+SELECT
+       b.backend_type,
+       b.buffers_alloc,
+       b.buffers_extend,
+       b.buffers_fsync,
+       b.buffers_write,
+       b.buffers_write_strat
+FROM pg_stat_get_buffer_actions() b;
+
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index bc9ac7ccfa..cbe4889fb6 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -90,17 +90,8 @@
  * requesting backends since the last checkpoint start.  The flags are
  * chosen so that OR'ing is the correct way to combine multiple requests.
  *
- * num_backend_writes is used to count the number of buffer writes performed
- * by user backend processes.  This counter should be wide enough that it
- * can't overflow during a single processing cycle.  num_backend_fsync
- * counts the subset of those writes that also had to do their own fsync,
- * because the checkpointer failed to absorb their request.
- *
  * The requests array holds fsync requests sent by backends and not yet
  * absorbed by the checkpointer.
- *
- * Unlike the checkpoint fields, num_backend_writes, num_backend_fsync, and
- * the requests fields are protected by CheckpointerCommLock.
  *----------
  */
 typedef struct
@@ -124,9 +115,6 @@ typedef struct
 	ConditionVariable start_cv; /* signaled when ckpt_started advances */
 	ConditionVariable done_cv;	/* signaled when ckpt_done advances */
 
-	uint32		num_backend_writes; /* counts user backend buffer writes */
-	uint32		num_backend_fsync;	/* counts user backend fsync calls */
-
 	int			num_requests;	/* current # of requests */
 	int			max_requests;	/* allocated array size */
 	CheckpointerRequest requests[FLEXIBLE_ARRAY_MEMBER];
@@ -1089,10 +1077,6 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 
 	LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
-	/* Count all backend writes regardless of if they fit in the queue */
-	if (!AmBackgroundWriterProcess())
-		CheckpointerShmem->num_backend_writes++;
-
 	/*
 	 * If the checkpointer isn't running or the request queue is full, the
 	 * backend will have to perform its own fsync request.  But before forcing
@@ -1106,8 +1090,10 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 		 * Count the subset of writes where backends have to do their own
 		 * fsync
 		 */
+		/* TODO: should we count fsyncs for all types of procs? */
 		if (!AmBackgroundWriterProcess())
-			CheckpointerShmem->num_backend_fsync++;
+			pgstat_increment_buffer_action(BA_Fsync);
+
 		LWLockRelease(CheckpointerCommLock);
 		return false;
 	}
@@ -1264,13 +1250,6 @@ AbsorbSyncRequests(void)
 
 	LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
-	/* Transfer stats counts into pending pgstats message */
-	BgWriterStats.m_buf_written_backend += CheckpointerShmem->num_backend_writes;
-	BgWriterStats.m_buf_fsync_backend += CheckpointerShmem->num_backend_fsync;
-
-	CheckpointerShmem->num_backend_writes = 0;
-	CheckpointerShmem->num_backend_fsync = 0;
-
 	/*
 	 * We try to avoid holding the lock for a long time by copying the request
 	 * array, and processing the requests after releasing the lock.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 11702f2a80..03d8e13c3a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -5352,9 +5352,6 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_written_checkpoints += msg->m_buf_written_checkpoints;
 	globalStats.buf_written_clean += msg->m_buf_written_clean;
 	globalStats.maxwritten_clean += msg->m_maxwritten_clean;
-	globalStats.buf_written_backend += msg->m_buf_written_backend;
-	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
-	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
 /* ----------
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 33d99f604a..3bfbb48b1f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -963,6 +963,11 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	if (isExtend)
 	{
+		/*
+		 * Extends counted here are only those that go through shared buffers
+		 */
+		pgstat_increment_buffer_action(BA_Extend);
+
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
 		/* don't set checksum for all-zero page */
@@ -1229,11 +1234,60 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					if (XLogNeedsFlush(lsn) &&
 						StrategyRejectBuffer(strategy, buf))
 					{
+						/*
+						 * Unset the strat write flag, as we will not be writing
+						 * this particular buffer from our ring out and may end
+						 * up having to find a buffer from main shared buffers,
+						 * which, if it is dirty, we may have to write out, which
+						 * could have been prevented by checkpointing and background
+						 * writing
+						 */
+						StrategyUnChooseBufferFromRing(strategy);
+
 						/* Drop lock/pin and loop around for another buffer */
 						LWLockRelease(BufferDescriptorGetContentLock(buf));
 						UnpinBuffer(buf, true);
 						continue;
 					}
+
+					/*
+					 * TODO: there is certainly a better way to write this
+					 * logic
+					 */
+
+					/*
+					 * The dirty buffer that will be written out was selected
+					 * from the ring and we did not bother checking the
+					 * freelist or doing a clock sweep to look for a clean
+					 * buffer to use, thus, this write will be counted as a
+					 * strategy write -- one that may be unnecessary without a
+					 * strategy
+					 */
+					if (StrategyIsBufferFromRing(strategy))
+					{
+						pgstat_increment_buffer_action(BA_Write_Strat);
+					}
+
+						/*
+						 * If the dirty buffer was one we grabbed from the
+						 * freelist or through a clock sweep, it could have been
+						 * written out by bgwriter or checkpointer, thus, we will
+						 * count it as a regular write
+						 */
+					else
+						pgstat_increment_buffer_action(BA_Write);
+				}
+				else
+				{
+					/*
+					 * If strategy is NULL, we could only be doing a write.
+					 * Extend operations will be counted in smgrextend. That
+					 * is separate I/O than any flushing of dirty buffers. If
+					 * we add more Backend Access Types, perhaps we will need
+					 * additional checks here
+					 */
+					pgstat_increment_buffer_action(BA_Write);
+
 				}
 
 				/* OK, do the I/O */
@@ -2246,9 +2300,6 @@ BgBufferSync(WritebackContext *wb_context)
 	 */
 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
 
-	/* Report buffer alloc counts to pgstat */
-	BgWriterStats.m_buf_alloc += recent_alloc;
-
 	/*
 	 * If we're not running the LRU scan, just stop after doing the stats
 	 * stuff.  We mark the saved state invalid so that we can recover sanely
@@ -2543,6 +2594,8 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	 * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
 	 * buffer is clean by the time we've locked it.)
 	 */
+	pgstat_increment_buffer_action(BA_Write);
+
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
@@ -2895,6 +2948,20 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
 	 */
+
+	/*
+	 * TODO: consider that if we did not need to distinguish between a buffer
+	 * flushed that was grabbed from the ring buffer and written out as part
+	 * of a strategy which was not from main Shared Buffers (and thus
+	 * preventable by bgwriter or checkpointer), then we could move all calls
+	 * to pgstat_increment_buffer_action() here except for the one for
+	 * extends, which would remain in ReadBuffer_common() before smgrextend()
+	 * (unless we decide to start counting other extends). That includes the
+	 * call to count buffers written by bgwriter and checkpointer which go
+	 * through FlushBuffer() but not BufferAlloc(). That would make it
+	 * simpler. Perhaps instead we can find somewhere else to indicate that
+	 * the buffer is from the ring of buffers to reuse.
+	 */
 	smgrwrite(reln,
 			  buf->tag.forkNum,
 			  buf->tag.blockNum,
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 6be80476db..523b024992 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -19,6 +19,7 @@
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "storage/proc.h"
+#include "utils/backend_status.h"
 
 #define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)&(var))))
 
@@ -51,7 +52,6 @@ typedef struct
 	 * overflow during a single bgwriter cycle.
 	 */
 	uint32		completePasses; /* Complete cycles of the clock sweep */
-	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */
 
 	/*
 	 * Bgworker process to be notified upon activity or -1 if none. See
@@ -86,6 +86,13 @@ typedef struct BufferAccessStrategyData
 	 * ring already.
 	 */
 	bool		current_was_in_ring;
+	/*
+	 * If we could chose a buffer from this list and we end up having to write
+	 * it out because it is dirty when we actually could have found a clean
+	 * buffer in either the freelist or through doing a clock sweep of shared
+	 * buffers, this flag will indicate that
+	 */
+	bool		chose_buffer_in_ring;
 
 	/*
 	 * Array of buffer numbers.  InvalidBuffer (that is, zero) indicates we
@@ -213,7 +220,10 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
 		if (buf != NULL)
+		{
+			StrategyChooseBufferBufferFromRing(strategy);
 			return buf;
+		}
 	}
 
 	/*
@@ -247,7 +257,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	 * the rate of buffer consumption.  Note that buffers recycled by a
 	 * strategy object are intentionally not counted here.
 	 */
-	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
+	pgstat_increment_buffer_action(BA_Alloc);
 
 	/*
 	 * First check, without acquiring the lock, whether there's buffers in the
@@ -411,11 +421,6 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
 		 */
 		*complete_passes += nextVictimBuffer / NBuffers;
 	}
-
-	if (num_buf_alloc)
-	{
-		*num_buf_alloc = pg_atomic_exchange_u32(&StrategyControl->numBufferAllocs, 0);
-	}
 	SpinLockRelease(&StrategyControl->buffer_strategy_lock);
 	return result;
 }
@@ -517,7 +522,6 @@ StrategyInitialize(bool init)
 
 		/* Clear statistics */
 		StrategyControl->completePasses = 0;
-		pg_atomic_init_u32(&StrategyControl->numBufferAllocs, 0);
 
 		/* No pending notification */
 		StrategyControl->bgwprocno = -1;
@@ -702,3 +706,20 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
 
 	return true;
 }
+void
+StrategyUnChooseBufferFromRing(BufferAccessStrategy strategy)
+{
+	strategy->chose_buffer_in_ring = false;
+}
+
+void
+StrategyChooseBufferBufferFromRing(BufferAccessStrategy strategy)
+{
+	strategy->chose_buffer_in_ring = true;
+}
+
+bool
+StrategyIsBufferFromRing(BufferAccessStrategy strategy)
+{
+	return strategy->chose_buffer_in_ring;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..c662853423 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -240,6 +240,7 @@ CreateSharedMemoryAndSemaphores(void)
 		InitProcGlobal();
 	CreateSharedProcArray();
 	CreateSharedBackendStatus();
+	CreateBufferActionStatsCounters();
 	TwoPhaseShmemInit();
 	BackgroundWorkerShmemInit();
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 2901f9f5a9..ec1a7d4c3a 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -55,6 +55,7 @@ static char *BackendAppnameBuffer = NULL;
 static char *BackendClientHostnameBuffer = NULL;
 static char *BackendActivityBuffer = NULL;
 static Size BackendActivityBufferSize = 0;
+static PgBackendBufferActionStats *BufferActionStatsArray    = NULL;
 #ifdef USE_SSL
 static PgBackendSSLStatus *BackendSslStatusBuffer = NULL;
 #endif
@@ -75,6 +76,7 @@ static MemoryContext backendStatusSnapContext;
 static void pgstat_beshutdown_hook(int code, Datum arg);
 static void pgstat_read_current_status(void);
 static void pgstat_setup_backend_status_context(void);
+static void pgstat_record_dead_backend_buffer_actions(void);
 
 
 /*
@@ -236,6 +238,35 @@ CreateSharedBackendStatus(void)
 #endif
 }
 
+void
+CreateBufferActionStatsCounters(void)
+{
+	bool		found;
+	Size		size;
+	int i;
+	PgBackendBufferActionStats *ba_stats;
+
+	size = BACKEND_NUM_TYPES * sizeof(PgBackendBufferActionStats);
+	BufferActionStatsArray = (PgBackendBufferActionStats *)
+		ShmemInitStruct("Buffer actions taken by each backend type", size, &found);
+	if (!found)
+		MemSet(BufferActionStatsArray, 0, size);
+
+	// TODO: do I want a lock on this while initializing the members?
+	ba_stats = BufferActionStatsArray;
+	for (i = 1; i < BACKEND_NUM_TYPES; i++)
+	{
+		pg_atomic_init_u64(&ba_stats->allocs, 0);
+		pg_atomic_init_u64(&ba_stats->extends, 0);
+		pg_atomic_init_u64(&ba_stats->fsyncs, 0);
+		pg_atomic_init_u64(&ba_stats->writes, 0);
+		pg_atomic_init_u64(&ba_stats->writes_strat, 0);
+
+		ba_stats++;
+	}
+}
+
+
 /*
  * Initialize pgstats backend activity state, and set up our on-proc-exit
  * hook.  Called from InitPostgres and AuxiliaryProcessMain. For auxiliary
@@ -399,6 +430,11 @@ pgstat_bestart(void)
 	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
 	lbeentry.st_progress_command_target = InvalidOid;
 	lbeentry.st_query_id = UINT64CONST(0);
+	pg_atomic_init_u64(&lbeentry.buffer_action_stats.allocs, 0);
+	pg_atomic_init_u64(&lbeentry.buffer_action_stats.extends, 0);
+	pg_atomic_init_u64(&lbeentry.buffer_action_stats.fsyncs, 0);
+	pg_atomic_init_u64(&lbeentry.buffer_action_stats.writes, 0);
+	pg_atomic_init_u64(&lbeentry.buffer_action_stats.writes_strat, 0);
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
@@ -469,6 +505,16 @@ pgstat_beshutdown_hook(int code, Datum arg)
 	beentry->st_procpid = 0;	/* mark invalid */
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	/*
+	 * Because the stats tracking shared buffers written and extended do not
+	 * go through the stats collector, it didn't make sense to add them to
+	 * pgstat_report_stat() At least the DatabaseId should be valid. Otherwise
+	 * we can't be sure that the members were zero-initialized (TODO: is that
+	 * true?)
+	 */
+	if (OidIsValid(MyDatabaseId))
+		pgstat_record_dead_backend_buffer_actions();
 }
 
 /*
@@ -1041,6 +1087,91 @@ pgstat_get_my_query_id(void)
 	 */
 	return MyBEEntry->st_query_id;
 }
+void
+pgstat_increment_buffer_action(BufferActionType ba_type)
+{
+	volatile PgBackendStatus *beentry   = MyBEEntry;
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	if (ba_type == BA_Alloc)
+		pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.allocs, 1);
+	else if (ba_type == BA_Extend)
+		pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.extends, 1);
+	else if (ba_type == BA_Fsync)
+		pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.fsyncs, 1);
+	else if (ba_type == BA_Write)
+		pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.writes, 1);
+	else if (ba_type == BA_Write_Strat)
+		pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.writes_strat, 1);
+}
+
+/*
+ * Called for a single backend at the time of death to persist its I/O stats
+ */
+void
+pgstat_record_dead_backend_buffer_actions(void)
+{
+	volatile PgBackendBufferActionStats *ba_stats;
+	volatile	PgBackendStatus *beentry = MyBEEntry;
+
+	if (beentry->st_procpid != 0)
+		return;
+
+	// TODO: is this correct? could there be a data race? do I need a lock?
+	ba_stats = &BufferActionStatsArray[beentry->st_backendType];
+	pg_atomic_add_fetch_u64(&ba_stats->allocs, pg_atomic_read_u64(&beentry->buffer_action_stats.allocs));
+	pg_atomic_add_fetch_u64(&ba_stats->extends, pg_atomic_read_u64(&beentry->buffer_action_stats.extends));
+	pg_atomic_add_fetch_u64(&ba_stats->fsyncs, pg_atomic_read_u64(&beentry->buffer_action_stats.fsyncs));
+	pg_atomic_add_fetch_u64(&ba_stats->writes, pg_atomic_read_u64(&beentry->buffer_action_stats.writes));
+	pg_atomic_add_fetch_u64(&ba_stats->writes_strat, pg_atomic_read_u64(&beentry->buffer_action_stats.writes_strat));
+}
+
+/*
+ * Fill the provided values array with the accumulated counts of buffer actions
+ * taken by all backends of type backend_type (input parameter), both alive and
+ * dead. This is currently only used by pg_stat_get_buffer_actions() to create
+ * the rows in the pg_stat_buffer_actions system view.
+ */
+void
+pgstat_recount_all_buffer_actions(BackendType backend_type, Datum *values)
+{
+	int			i;
+	volatile PgBackendStatus *beentry;
+
+	/*
+	 * Add stats from all exited backends
+	 */
+	values[BA_Alloc] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].allocs);
+	values[BA_Extend] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].extends);
+	values[BA_Fsync] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].fsyncs);
+	values[BA_Write] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].writes);
+	values[BA_Write_Strat] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].writes_strat);
+
+	/*
+	 * Loop through all live backends and count their buffer actions
+	 */
+	// TODO: see note in pg_stat_get_buffer_actions() about inefficiency of this method
+
+	beentry = BackendStatusArray;
+	for (i = 1; i <= MaxBackends; i++)
+	{
+		/* Don't count dead backends. They should already be counted */
+		if (beentry->st_procpid == 0)
+			continue;
+		if (beentry->st_backendType != backend_type)
+			continue;
+
+		values[BA_Alloc] += pg_atomic_read_u64(&beentry->buffer_action_stats.allocs);
+		values[BA_Extend] += pg_atomic_read_u64(&beentry->buffer_action_stats.extends);
+		values[BA_Fsync] += pg_atomic_read_u64(&beentry->buffer_action_stats.fsyncs);
+		values[BA_Write] += pg_atomic_read_u64(&beentry->buffer_action_stats.writes);
+		values[BA_Write_Strat] += pg_atomic_read_u64(&beentry->buffer_action_stats.writes_strat);
+
+		beentry++;
+	}
+}
 
 
 /* ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index f0e09eae4d..ce4d97e5a4 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1780,21 +1780,52 @@ pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
 }
 
 Datum
-pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS)
+pg_stat_get_buffer_actions(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_INT64(pgstat_fetch_global()->buf_written_backend);
-}
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
 
-Datum
-pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_global()->buf_fsync_backend);
-}
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
 
-Datum
-pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+	// TODO: doing the loop like this means we will loop through all backends up to BACKEND_NUM_TYPES times
+	// could preallocate the values arrays and then loop through the backends once, filling in the appropriate values array
+	for (size_t i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		/*
+		 * Currently, the only supported backend types for stats are the following.
+		 * If this were to change, pg_proc.dat would need to be changed as well
+		 * to reflect the new expected number of rows.
+		 */
+		Datum values[BUFFER_ACTION_NUM_TYPES];
+		bool nulls[BUFFER_ACTION_NUM_TYPES];
+		if (!(i == B_BG_WRITER || i == B_CHECKPOINTER || i == B_AUTOVAC_WORKER || i == B_BACKEND))
+			continue;
+
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = CStringGetTextDatum(GetBackendTypeDesc(i));
+		pgstat_recount_all_buffer_actions(i, values);
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
 }
 
 /*
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 8b73850d0d..d0923407ff 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -277,6 +277,8 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case BACKEND_NUM_TYPES:
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8cd0252082..3d3a0eea3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5565,18 +5565,15 @@
   proname => 'pg_stat_get_checkpoint_sync_time', provolatile => 's',
   proparallel => 'r', prorettype => 'float8', proargtypes => '',
   prosrc => 'pg_stat_get_checkpoint_sync_time' },
-{ oid => '2775', descr => 'statistics: number of buffers written by backends',
-  proname => 'pg_stat_get_buf_written_backend', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_buf_written_backend' },
-{ oid => '3063',
-  descr => 'statistics: number of backend buffer writes that did their own fsync',
-  proname => 'pg_stat_get_buf_fsync_backend', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_buf_fsync_backend' },
-{ oid => '2859', descr => 'statistics: number of buffer allocations',
-  proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
-  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
+
+  { oid => '8459', descr => 'statistics: counts of buffer actions taken by each backend type',
+  proname => 'pg_stat_get_buffer_actions', provolatile => 's', proisstrict => 'f',
+  prorows => '4', proretset => 't',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,int8,int8,int8,int8,int8}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{backend_type,buffers_alloc,buffers_extend,buffers_fsync,buffers_write,buffers_write_strat}',
+  prosrc => 'pg_stat_get_buffer_actions' },
 
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 68d840d699..24d2943d9c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -336,8 +336,20 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	BACKEND_NUM_TYPES,
 } BackendType;
 
+typedef enum BufferActionType
+{
+	BA_Invalid = 0,
+	BA_Alloc,
+	BA_Extend,
+	BA_Fsync,
+	BA_Write,
+	BA_Write_Strat,
+	BUFFER_ACTION_NUM_TYPES,
+}			BufferActionType;
+
 extern BackendType MyBackendType;
 
 extern const char *GetBackendTypeDesc(BackendType backendType);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9612c0a6c2..9d0c2a5e1f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -475,9 +475,6 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_buf_written_checkpoints;
 	PgStat_Counter m_buf_written_clean;
 	PgStat_Counter m_maxwritten_clean;
-	PgStat_Counter m_buf_written_backend;
-	PgStat_Counter m_buf_fsync_backend;
-	PgStat_Counter m_buf_alloc;
 	PgStat_Counter m_checkpoint_write_time; /* times in milliseconds */
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
@@ -854,9 +851,6 @@ typedef struct PgStat_GlobalStats
 	PgStat_Counter buf_written_checkpoints;
 	PgStat_Counter buf_written_clean;
 	PgStat_Counter maxwritten_clean;
-	PgStat_Counter buf_written_backend;
-	PgStat_Counter buf_fsync_backend;
-	PgStat_Counter buf_alloc;
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 33fcaf5c9a..2bec2cee45 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -314,6 +314,9 @@ extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
 extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 								 BufferDesc *buf);
+extern void StrategyUnChooseBufferFromRing(BufferAccessStrategy strategy);
+extern void StrategyChooseBufferBufferFromRing(BufferAccessStrategy strategy);
+extern bool StrategyIsBufferFromRing(BufferAccessStrategy strategy);
 
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern void StrategyNotifyBgWriter(int bgwprocno);
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 8042b817df..f5360b5aff 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -13,6 +13,7 @@
 #include "datatype/timestamp.h"
 #include "libpq/pqcomm.h"
 #include "miscadmin.h"			/* for BackendType */
+#include "port/atomics.h"
 #include "utils/backend_progress.h"
 
 
@@ -79,6 +80,15 @@ typedef struct PgBackendGSSStatus
 
 } PgBackendGSSStatus;
 
+typedef struct PgBackendBufferActionStats
+{
+	pg_atomic_uint64 allocs;
+	pg_atomic_uint64 extends;
+	pg_atomic_uint64 fsyncs;
+	pg_atomic_uint64 writes;
+	pg_atomic_uint64 writes_strat;
+} PgBackendBufferActionStats;
+
 
 /* ----------
  * PgBackendStatus
@@ -168,6 +178,8 @@ typedef struct PgBackendStatus
 
 	/* query identifier, optionally computed using post_parse_analyze_hook */
 	uint64		st_query_id;
+	// TODO: do its members need to be atomics when in the PgBackendStatus since only this backend will write to them?
+	PgBackendBufferActionStats buffer_action_stats;
 } PgBackendStatus;
 
 
@@ -282,7 +294,7 @@ extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
  */
 extern Size BackendStatusShmemSize(void);
 extern void CreateSharedBackendStatus(void);
-
+extern void CreateBufferActionStatsCounters(void);
 
 /* ----------
  * Functions called from backends
@@ -305,6 +317,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
 extern uint64 pgstat_get_my_query_id(void);
+extern void pgstat_increment_buffer_action(BufferActionType ba_type);
+extern void pgstat_recount_all_buffer_actions(BackendType backend_type, Datum *values);
 
 
 /* ----------
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e5ab11275d..609ccf3b7b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1824,10 +1824,14 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
     pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
     pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
     pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-    pg_stat_get_buf_written_backend() AS buffers_backend,
-    pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-    pg_stat_get_buf_alloc() AS buffers_alloc,
     pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+pg_stat_buffer_actions| SELECT b.backend_type,
+    b.buffers_alloc,
+    b.buffers_extend,
+    b.buffers_fsync,
+    b.buffers_write,
+    b.buffers_write_strat
+   FROM pg_stat_get_buffer_actions() b(backend_type, buffers_alloc, buffers_extend, buffers_fsync, buffers_write, buffers_write_strat);
 pg_stat_database| SELECT d.oid AS datid,
     d.datname,
         CASE
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index feaaee6326..fb4b613d4b 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -176,4 +176,5 @@ FROM prevstats AS pr;
 
 DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
 DROP TABLE prevstats;
+SELECT * FROM pg_stat_buffer_actions;
 -- End of Stats Test
-- 
2.27.0

Reply via email to