On Thu, Oct 17, 2024 at 01:24:50PM +0900, Michael Paquier wrote:
>> Yeah, FWIW, we would be going from 32 bytes:
>>                                /* total size (bytes):   32 */
>> 
>> to 40 bytes (with the patch applied):
>>                                /* total size (bytes):   40 */
>> 
>> Due to the padding, that's a 8 bytes increase while we're adding "only" 4 
>> bytes.
> 
> I have spent some time digging into all the original pgstat threads 
> dealing with the shmem implementation or dshash, and I'm not really
> seeing anybody stressing on the size of the individual hash entries.
> This stuff was already wasting space with padding originally, so
> perhaps we're stressing too much here?  Would anybody like to comment?

I've been going through this patch again, and the failures that could
be seen because of such failures, like standbys failing in an
unpredictible way is not cool, so I am planning to apply the attached
down to 15 now that the release has colled down.  At the end I am not
really stressing about this addition in the structure for the sake of
making the stats entries safe to handle.

I did not like much the name "agecount" though, used to cross-check
how many times an entry is reused in shared memory and in the local
copy we keep in a process, so I've renamed it to "generation".
 --
Michael
From abd8b567f3cd97ca4e0282ce7657f64807f8c008 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Thu, 14 Nov 2024 16:53:58 +0900
Subject: [PATCH v2] Introduce "generation" counter for pgstats entries

This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid
when it gets reused.  This can happen for the following case, causing
the same hash key to be used:
- replication slots that compute internally an index number.
- other object types, with a wraparound involved.
- As of PostgreSQL 18, custom pgstats kinds.

The "generation" is tracked in the shared entries, initialized at 0 when
an entry is created and incremented when the same entry is reused, to
avoid concurrent turbulences on drop because of other backends still
holding a reference to it.
---
 src/include/utils/pgstat_internal.h       | 14 ++++++++
 src/backend/utils/activity/pgstat_shmem.c | 40 ++++++++++++++++++++---
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 61b2e1f96b..45f5901d45 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -94,6 +94,14 @@ typedef struct PgStatShared_HashEntry
 	 */
 	pg_atomic_uint32 refcount;
 
+	/*
+	 * Counter tracking the number of times the entry has been reused.
+	 *
+	 * Set to 0 when the entry is created, and incremented by one each time
+	 * the shared entry is reinitialized with pgstat_reinit_entry().
+	 */
+	pg_atomic_uint32 generation;
+
 	/*
 	 * Pointer to shared stats. The stats entry always starts with
 	 * PgStatShared_Common, embedded in a larger struct containing the
@@ -133,6 +141,12 @@ typedef struct PgStat_EntryRef
 	 */
 	PgStatShared_Common *shared_stats;
 
+	/*
+	 * Copy of PgStatShared_HashEntry->generation, keeping locally track of
+	 * the shared stats entry "generation" retrieved (number of times reused).
+	 */
+	uint32		generation;
+
 	/*
 	 * Pending statistics data that will need to be flushed to shared memory
 	 * stats eventually. Each stats kind utilizing pending data defines what
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index c1b7ff76b1..c8ee0bd0b0 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -304,6 +304,11 @@ pgstat_init_entry(PgStat_Kind kind,
 	 * further if a longer lived reference is needed.
 	 */
 	pg_atomic_init_u32(&shhashent->refcount, 1);
+
+	/*
+	 * Initialize generation to 0, as freshly initialized.
+	 */
+	pg_atomic_init_u32(&shhashent->generation, 0);
 	shhashent->dropped = false;
 
 	chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
@@ -327,6 +332,12 @@ pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent)
 
 	/* mark as not dropped anymore */
 	pg_atomic_fetch_add_u32(&shhashent->refcount, 1);
+
+	/*
+	 * Increment generation, to let any backend with local references know
+	 * that what they point to is outdated.
+	 */
+	pg_atomic_fetch_add_u32(&shhashent->generation, 1);
 	shhashent->dropped = false;
 
 	/* reinitialize content */
@@ -367,6 +378,7 @@ pgstat_acquire_entry_ref(PgStat_EntryRef *entry_ref,
 
 	entry_ref->shared_stats = shheader;
 	entry_ref->shared_entry = shhashent;
+	entry_ref->generation = pg_atomic_read_u32(&shhashent->generation);
 }
 
 /*
@@ -532,7 +544,8 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 			 * case are replication slot stats, where a new slot can be
 			 * created with the same index just after dropping. But oid
 			 * wraparound can lead to other cases as well. We just reset the
-			 * stats to their plain state.
+			 * stats to their plain state, while incrementing its "generation"
+			 * in the shared entry for any remaining local references.
 			 */
 			shheader = pgstat_reinit_entry(kind, shhashent);
 			pgstat_acquire_entry_ref(entry_ref, shhashent, shheader);
@@ -599,10 +612,27 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref,
 			if (!shent)
 				elog(ERROR, "could not find just referenced shared stats entry");
 
-			Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
-			Assert(entry_ref->shared_entry == shent);
-
-			pgstat_free_entry(shent, NULL);
+			/*
+			 * This entry may have been reinitialized while trying to release
+			 * it, so double-check that it has not been reused while holding a
+			 * lock on its shared entry.
+			 */
+			if (pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
+				entry_ref->generation)
+			{
+				/* Same generation, so we're OK with the removal */
+				Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
+				Assert(entry_ref->shared_entry == shent);
+				pgstat_free_entry(shent, NULL);
+			}
+			else
+			{
+				/*
+				 * Shared stats entry has been reinitialized, so do not drop
+				 * its shared entry, only release its lock.
+				 */
+				dshash_release_lock(pgStatLocal.shared_hash, shent);
+			}
 		}
 	}
 
-- 
2.45.2

Attachment: signature.asc
Description: PGP signature

Reply via email to