On Wed, Dec 04, 2024 at 03:20:03PM +0000, Bertrand Drouvot wrote:
> I need to think more about it but it seems to me that those values make sense,
> so maybe we should drop the entry for this particular case (shmem_exit()).

The values reported in the central hash table make sense to me.  What
does not is that we could hold in the local cache of the process doing
the peeks references to stats from a different slot when doing drops
and creates that would reuse the same stats key.

I have added in the backends some logs to get into the details of the
sequence with this specific test regarding the generation markup and
the refcount (feel free to use the 0002 attached, grep for "key
4/0/0" in your logs by replaying the test), and here some details
based on the two permutations in the test sent:
1) session 1: pg_create_logical_replication_slot(), does
pgstat_init_entry() with generation=0 refcount=1.  Then does
pgstat_acquire_entry_ref, refcount++ now refcount=2.
2) session 2: pg_logical_slot_peek_binary_changes(), calls
pgstat_acquire_entry_ref(), refcount++ now refcount=3.  Note that the
session keeps a hold on the replslot stats.
3) session 1: pg_drop_replication_slot(), calls
pgstat_release_entry_ref(), refcount-- now refcount=2.  The object
cannot be dropped in pgstat_drop_entry_internal() as session 2 still
holds a reference at it has peeked at the previous stats, keeping a
local reference.
4) Second permutation begins.  session 1 calls
pg_create_logical_replication_slot() and does pgstat_reinit_entry()
generation++ refcount++, generation=1 refcount=3 as the previous entry
of session 2 is still there.  pgstat_acquire_entry_ref() is called
again, refcount++ now at 4.
5) session 2, does again pg_logical_slot_peek_binary_changes(), does
not interact with the central pgstats as it holds locally a reference
to the slot's entry, based on its previous generation=0, so its
pgstat_get_entry_ref() gets the stats reference from its local cache.
The mistake is here: at this stage session 2 should refresh its local
reference so as it points to the new generation entry.
6) session 1, pg_drop_replication_slot(), calls
pgstat_release_entry_ref() and refcount-- now at 3.  It cannot drop the
entry as refcount>0.
7) session 2 shuts down, pgstat_release_entry_ref is called,
refcount-- now refcount=2.  pgstat_release_entry_ref() is called a
second time, refcount-- now refcount=1, cannot drop the stats entry as
the generation is not the same, as the local reference comes from the
previous generation=0, current generation=1.  As it stands, the entry
should not be dropped as it's been reinitialized once.  The data in
the dshash is correct, but session 2 is doing the wrong thing and
should:
- Hold a local reference of generation 1.
- Drop the stats entry as it is the last one holding it.

> While I agree that your test case does produce a failed assertion, I don't
> think it's linked to b36fbd9f8d (which focused on retrieving consistent 
> stats).

That's unrelated to b36fbd9f8d.  The assertion can be reached after
818119afccd3.  Before this commit, based on your test, the second
session doing the peeks would try to drop the stats entry of a
replication slot that has been re-created when it shuts down, which is
incorrect, because its local reference is from the first creation, and
it tries to drop the entry of the second creation.

The assertion is right to complain here at shutdown when writing the
stats: we shouldn't try to drop this entry.  I've spent a few hours
checking the whole, and we are doing the right things with
gc_request_count, pgStatSharedRefAge and the local cache refresh, as
I've thought first that we were not aggressive with this part of the
cache resets.  The problem, as far as I can see, is that
pgstat_gc_entry_refs() did not get the call that it needs to think
about refreshing its local references when an entry updates its
generation number; it only checks if the entry has been dropped, the
generation check is equally important after a reinit because the local
reference is also outdated.
--
Michael
From 77bcf14b6c8576de416b7d01320d42755e10c665 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Thu, 5 Dec 2024 13:23:48 +0900
Subject: [PATCH 1/2] Fix cleanup of local cache pgstats entries for entry
 reinit

818119afccd3 has introduced the generation concept when a pgstats entry
is reinitialized, but did not count on the fact that backends still
holding local references to such entries need to be refreshed if the
cache age is outdated.  The mattered for replication slot stats, where a
concurrent drop/create could cause incorrect stats to be locally
referenced.

Backpatch-through: 15
---
 src/backend/utils/activity/pgstat_shmem.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 041c262b9f..ccb5e5f6de 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -729,7 +729,8 @@ pgstat_gc_entry_refs(void)
 	Assert(curage != 0);
 
 	/*
-	 * Some entries have been dropped. Invalidate cache pointer to them.
+	 * Some entries have been dropped or reinitialized.  Invalidate cache
+	 * pointer to them.
 	 */
 	pgstat_entry_ref_hash_start_iterate(pgStatEntryRefHash, &i);
 	while ((ent = pgstat_entry_ref_hash_iterate(pgStatEntryRefHash, &i)) != NULL)
@@ -739,7 +740,13 @@ pgstat_gc_entry_refs(void)
 		Assert(!entry_ref->shared_stats ||
 			   entry_ref->shared_stats->magic == 0xdeadbeef);
 
-		if (!entry_ref->shared_entry->dropped)
+		/*
+		 * "generation" checks for the case of entries being reinitialized, and
+		 * "dropped" for the case where these are..  dropped.
+		 */
+		if (!entry_ref->shared_entry->dropped &&
+			pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
+			entry_ref->generation)
 			continue;
 
 		/* cannot gc shared ref that has pending data */
-- 
2.45.2

From 30fe57ed7fd72f86f233edd9bab0fd8c20099dba Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Thu, 5 Dec 2024 13:25:35 +0900
Subject: [PATCH 2/2] Add various logs in pgstats to debug entry handling

Not for commit, nice to play with.
---
 src/backend/utils/activity/pgstat_replslot.c |  1 +
 src/backend/utils/activity/pgstat_shmem.c    | 37 ++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ddf2ab9928..e41806d7f6 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -157,6 +157,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 {
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
+	elog(WARNING, "XXX pgstat_drop_replslot index %d", ReplicationSlotIndex(slot));
 	if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
 						   ReplicationSlotIndex(slot)))
 		pgstat_request_entry_refs_gc();
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ccb5e5f6de..f717b52226 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -296,6 +296,7 @@ pgstat_init_entry(PgStat_Kind kind,
 	/* Create new stats entry. */
 	dsa_pointer chunk;
 	PgStatShared_Common *shheader;
+	PgStat_HashKey key = shhashent->key;
 
 	/*
 	 * Initialize refcount to 1, marking it as valid / not dropped. The entry
@@ -310,6 +311,8 @@ pgstat_init_entry(PgStat_Kind kind,
 	 */
 	pg_atomic_init_u32(&shhashent->generation, 0);
 	shhashent->dropped = false;
+	elog(WARNING, "XXX key %u/%u/%llu pgstat_init_entry generation 0",
+		 key.kind, key.dboid, (unsigned long long) key.objid);
 
 	chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
 	shheader = dsa_get_address(pgStatLocal.dsa, chunk);
@@ -327,6 +330,7 @@ static PgStatShared_Common *
 pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent)
 {
 	PgStatShared_Common *shheader;
+	PgStat_HashKey key = shhashent->key;
 
 	shheader = dsa_get_address(pgStatLocal.dsa, shhashent->body);
 
@@ -339,6 +343,8 @@ pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent)
 	 */
 	pg_atomic_fetch_add_u32(&shhashent->generation, 1);
 	shhashent->dropped = false;
+	elog(WARNING, "XXX key %u/%u/%llu pgstat_reinit_entry generation++ refcount++",
+		 key.kind, key.dboid, (unsigned long long) key.objid);
 
 	/* reinitialize content */
 	Assert(shheader->magic == 0xdeadbeef);
@@ -369,10 +375,13 @@ pgstat_acquire_entry_ref(PgStat_EntryRef *entry_ref,
 						 PgStatShared_HashEntry *shhashent,
 						 PgStatShared_Common *shheader)
 {
+	PgStat_HashKey key = shhashent->key;
 	Assert(shheader->magic == 0xdeadbeef);
 	Assert(pg_atomic_read_u32(&shhashent->refcount) > 0);
 
 	pg_atomic_fetch_add_u32(&shhashent->refcount, 1);
+	elog(WARNING, "XXX key %u/%u/%llu pgstat_acquire_entry_ref refcount++",
+		 key.kind, key.dboid, (unsigned long long) key.objid);
 
 	dshash_release_lock(pgStatLocal.shared_hash, shhashent);
 
@@ -594,6 +603,9 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref,
 		 * increasing the refcount because it is not "legal" to create
 		 * additional references to dropped entries.
 		 */
+		elog(WARNING,
+			 "XXX: key %u/%u/%llu pgstat_release_entry_ref refcount--",
+			 key.kind, key.dboid, (unsigned long long) key.objid);
 		if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1)
 		{
 			PgStatShared_HashEntry *shent;
@@ -620,6 +632,9 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref,
 			if (pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
 				entry_ref->generation)
 			{
+				elog(WARNING,
+					 "XXX: key %u/%u/%llu pgstat_release_entry_ref same generation",
+					 key.kind, key.dboid, (unsigned long long) key.objid);
 				/* Same "generation", so we're OK with the removal */
 				Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
 				Assert(entry_ref->shared_entry == shent);
@@ -627,6 +642,12 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref,
 			}
 			else
 			{
+				elog(WARNING, "XXX: key %u/%u/%llu pgstat_release_entry_ref different generation "
+					 "shared_ref=%u entry_ref=%u",
+					 key.kind, key.dboid, (unsigned long long) key.objid,
+					 pg_atomic_read_u32(&entry_ref->shared_entry->generation),
+					 entry_ref->generation);
+
 				/*
 				 * Shared stats entry has been reinitialized, so do not drop
 				 * its shared entry, only release its lock.
@@ -753,6 +774,14 @@ pgstat_gc_entry_refs(void)
 		if (entry_ref->pending != NULL)
 			continue;
 
+		if (entry_ref->shared_entry->key.kind == PGSTAT_KIND_REPLSLOT)
+		{
+			PgStat_HashKey key = entry_ref->shared_entry->key;
+			elog(WARNING, "XXX key %u/%u/%llu pgstat_gc_entry_refs dropped generation=%d",
+				 key.kind, key.dboid, (unsigned long long) key.objid,
+				 entry_ref->generation);
+		}
+
 		pgstat_release_entry_ref(ent->key, entry_ref, false);
 	}
 
@@ -852,6 +881,7 @@ static bool
 pgstat_drop_entry_internal(PgStatShared_HashEntry *shent,
 						   dshash_seq_status *hstat)
 {
+	PgStat_HashKey key = shent->key;
 	Assert(shent->body != InvalidDsaPointer);
 
 	/* should already have released local reference */
@@ -874,11 +904,15 @@ pgstat_drop_entry_internal(PgStatShared_HashEntry *shent,
 	/* release refcount marking entry as not dropped */
 	if (pg_atomic_sub_fetch_u32(&shent->refcount, 1) == 0)
 	{
+		elog(WARNING, "XXX key %u/%u/%llu pgstat_drop_entry_internal refcount=0",
+			 key.kind, key.dboid, (unsigned long long) key.objid);
 		pgstat_free_entry(shent, hstat);
 		return true;
 	}
 	else
 	{
+		elog(WARNING, "XXX key %u/%u/%llu pgstat_drop_entry_internal refcount>0",
+			 key.kind, key.dboid, (unsigned long long) key.objid);
 		if (!hstat)
 			dshash_release_lock(pgStatLocal.shared_hash, shent);
 		return false;
@@ -963,6 +997,9 @@ pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 	key.dboid = dboid;
 	key.objid = objid;
 
+	elog(WARNING, "XXX key %u/%u/%llu pgstat_drop_entry",
+		 key.kind, key.dboid, (unsigned long long) key.objid);
+
 	/* delete local reference */
 	if (pgStatEntryRefHash)
 	{
-- 
2.45.2

Attachment: signature.asc
Description: PGP signature

Reply via email to