On Wed, Dec 04, 2024 at 03:20:03PM +0000, Bertrand Drouvot wrote: > I need to think more about it but it seems to me that those values make sense, > so maybe we should drop the entry for this particular case (shmem_exit()).
The values reported in the central hash table make sense to me. What does not is that we could hold in the local cache of the process doing the peeks references to stats from a different slot when doing drops and creates that would reuse the same stats key. I have added in the backends some logs to get into the details of the sequence with this specific test regarding the generation markup and the refcount (feel free to use the 0002 attached, grep for "key 4/0/0" in your logs by replaying the test), and here some details based on the two permutations in the test sent: 1) session 1: pg_create_logical_replication_slot(), does pgstat_init_entry() with generation=0 refcount=1. Then does pgstat_acquire_entry_ref, refcount++ now refcount=2. 2) session 2: pg_logical_slot_peek_binary_changes(), calls pgstat_acquire_entry_ref(), refcount++ now refcount=3. Note that the session keeps a hold on the replslot stats. 3) session 1: pg_drop_replication_slot(), calls pgstat_release_entry_ref(), refcount-- now refcount=2. The object cannot be dropped in pgstat_drop_entry_internal() as session 2 still holds a reference at it has peeked at the previous stats, keeping a local reference. 4) Second permutation begins. session 1 calls pg_create_logical_replication_slot() and does pgstat_reinit_entry() generation++ refcount++, generation=1 refcount=3 as the previous entry of session 2 is still there. pgstat_acquire_entry_ref() is called again, refcount++ now at 4. 5) session 2, does again pg_logical_slot_peek_binary_changes(), does not interact with the central pgstats as it holds locally a reference to the slot's entry, based on its previous generation=0, so its pgstat_get_entry_ref() gets the stats reference from its local cache. The mistake is here: at this stage session 2 should refresh its local reference so as it points to the new generation entry. 6) session 1, pg_drop_replication_slot(), calls pgstat_release_entry_ref() and refcount-- now at 3. It cannot drop the entry as refcount>0. 7) session 2 shuts down, pgstat_release_entry_ref is called, refcount-- now refcount=2. pgstat_release_entry_ref() is called a second time, refcount-- now refcount=1, cannot drop the stats entry as the generation is not the same, as the local reference comes from the previous generation=0, current generation=1. As it stands, the entry should not be dropped as it's been reinitialized once. The data in the dshash is correct, but session 2 is doing the wrong thing and should: - Hold a local reference of generation 1. - Drop the stats entry as it is the last one holding it. > While I agree that your test case does produce a failed assertion, I don't > think it's linked to b36fbd9f8d (which focused on retrieving consistent > stats). That's unrelated to b36fbd9f8d. The assertion can be reached after 818119afccd3. Before this commit, based on your test, the second session doing the peeks would try to drop the stats entry of a replication slot that has been re-created when it shuts down, which is incorrect, because its local reference is from the first creation, and it tries to drop the entry of the second creation. The assertion is right to complain here at shutdown when writing the stats: we shouldn't try to drop this entry. I've spent a few hours checking the whole, and we are doing the right things with gc_request_count, pgStatSharedRefAge and the local cache refresh, as I've thought first that we were not aggressive with this part of the cache resets. The problem, as far as I can see, is that pgstat_gc_entry_refs() did not get the call that it needs to think about refreshing its local references when an entry updates its generation number; it only checks if the entry has been dropped, the generation check is equally important after a reinit because the local reference is also outdated. -- Michael
From 77bcf14b6c8576de416b7d01320d42755e10c665 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Thu, 5 Dec 2024 13:23:48 +0900 Subject: [PATCH 1/2] Fix cleanup of local cache pgstats entries for entry reinit 818119afccd3 has introduced the generation concept when a pgstats entry is reinitialized, but did not count on the fact that backends still holding local references to such entries need to be refreshed if the cache age is outdated. The mattered for replication slot stats, where a concurrent drop/create could cause incorrect stats to be locally referenced. Backpatch-through: 15 --- src/backend/utils/activity/pgstat_shmem.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c index 041c262b9f..ccb5e5f6de 100644 --- a/src/backend/utils/activity/pgstat_shmem.c +++ b/src/backend/utils/activity/pgstat_shmem.c @@ -729,7 +729,8 @@ pgstat_gc_entry_refs(void) Assert(curage != 0); /* - * Some entries have been dropped. Invalidate cache pointer to them. + * Some entries have been dropped or reinitialized. Invalidate cache + * pointer to them. */ pgstat_entry_ref_hash_start_iterate(pgStatEntryRefHash, &i); while ((ent = pgstat_entry_ref_hash_iterate(pgStatEntryRefHash, &i)) != NULL) @@ -739,7 +740,13 @@ pgstat_gc_entry_refs(void) Assert(!entry_ref->shared_stats || entry_ref->shared_stats->magic == 0xdeadbeef); - if (!entry_ref->shared_entry->dropped) + /* + * "generation" checks for the case of entries being reinitialized, and + * "dropped" for the case where these are.. dropped. + */ + if (!entry_ref->shared_entry->dropped && + pg_atomic_read_u32(&entry_ref->shared_entry->generation) == + entry_ref->generation) continue; /* cannot gc shared ref that has pending data */ -- 2.45.2
From 30fe57ed7fd72f86f233edd9bab0fd8c20099dba Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Thu, 5 Dec 2024 13:25:35 +0900 Subject: [PATCH 2/2] Add various logs in pgstats to debug entry handling Not for commit, nice to play with. --- src/backend/utils/activity/pgstat_replslot.c | 1 + src/backend/utils/activity/pgstat_shmem.c | 37 ++++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c index ddf2ab9928..e41806d7f6 100644 --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -157,6 +157,7 @@ pgstat_drop_replslot(ReplicationSlot *slot) { Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE)); + elog(WARNING, "XXX pgstat_drop_replslot index %d", ReplicationSlotIndex(slot)); if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid, ReplicationSlotIndex(slot))) pgstat_request_entry_refs_gc(); diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c index ccb5e5f6de..f717b52226 100644 --- a/src/backend/utils/activity/pgstat_shmem.c +++ b/src/backend/utils/activity/pgstat_shmem.c @@ -296,6 +296,7 @@ pgstat_init_entry(PgStat_Kind kind, /* Create new stats entry. */ dsa_pointer chunk; PgStatShared_Common *shheader; + PgStat_HashKey key = shhashent->key; /* * Initialize refcount to 1, marking it as valid / not dropped. The entry @@ -310,6 +311,8 @@ pgstat_init_entry(PgStat_Kind kind, */ pg_atomic_init_u32(&shhashent->generation, 0); shhashent->dropped = false; + elog(WARNING, "XXX key %u/%u/%llu pgstat_init_entry generation 0", + key.kind, key.dboid, (unsigned long long) key.objid); chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size); shheader = dsa_get_address(pgStatLocal.dsa, chunk); @@ -327,6 +330,7 @@ static PgStatShared_Common * pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent) { PgStatShared_Common *shheader; + PgStat_HashKey key = shhashent->key; shheader = dsa_get_address(pgStatLocal.dsa, shhashent->body); @@ -339,6 +343,8 @@ pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent) */ pg_atomic_fetch_add_u32(&shhashent->generation, 1); shhashent->dropped = false; + elog(WARNING, "XXX key %u/%u/%llu pgstat_reinit_entry generation++ refcount++", + key.kind, key.dboid, (unsigned long long) key.objid); /* reinitialize content */ Assert(shheader->magic == 0xdeadbeef); @@ -369,10 +375,13 @@ pgstat_acquire_entry_ref(PgStat_EntryRef *entry_ref, PgStatShared_HashEntry *shhashent, PgStatShared_Common *shheader) { + PgStat_HashKey key = shhashent->key; Assert(shheader->magic == 0xdeadbeef); Assert(pg_atomic_read_u32(&shhashent->refcount) > 0); pg_atomic_fetch_add_u32(&shhashent->refcount, 1); + elog(WARNING, "XXX key %u/%u/%llu pgstat_acquire_entry_ref refcount++", + key.kind, key.dboid, (unsigned long long) key.objid); dshash_release_lock(pgStatLocal.shared_hash, shhashent); @@ -594,6 +603,9 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref, * increasing the refcount because it is not "legal" to create * additional references to dropped entries. */ + elog(WARNING, + "XXX: key %u/%u/%llu pgstat_release_entry_ref refcount--", + key.kind, key.dboid, (unsigned long long) key.objid); if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1) { PgStatShared_HashEntry *shent; @@ -620,6 +632,9 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref, if (pg_atomic_read_u32(&entry_ref->shared_entry->generation) == entry_ref->generation) { + elog(WARNING, + "XXX: key %u/%u/%llu pgstat_release_entry_ref same generation", + key.kind, key.dboid, (unsigned long long) key.objid); /* Same "generation", so we're OK with the removal */ Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0); Assert(entry_ref->shared_entry == shent); @@ -627,6 +642,12 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref, } else { + elog(WARNING, "XXX: key %u/%u/%llu pgstat_release_entry_ref different generation " + "shared_ref=%u entry_ref=%u", + key.kind, key.dboid, (unsigned long long) key.objid, + pg_atomic_read_u32(&entry_ref->shared_entry->generation), + entry_ref->generation); + /* * Shared stats entry has been reinitialized, so do not drop * its shared entry, only release its lock. @@ -753,6 +774,14 @@ pgstat_gc_entry_refs(void) if (entry_ref->pending != NULL) continue; + if (entry_ref->shared_entry->key.kind == PGSTAT_KIND_REPLSLOT) + { + PgStat_HashKey key = entry_ref->shared_entry->key; + elog(WARNING, "XXX key %u/%u/%llu pgstat_gc_entry_refs dropped generation=%d", + key.kind, key.dboid, (unsigned long long) key.objid, + entry_ref->generation); + } + pgstat_release_entry_ref(ent->key, entry_ref, false); } @@ -852,6 +881,7 @@ static bool pgstat_drop_entry_internal(PgStatShared_HashEntry *shent, dshash_seq_status *hstat) { + PgStat_HashKey key = shent->key; Assert(shent->body != InvalidDsaPointer); /* should already have released local reference */ @@ -874,11 +904,15 @@ pgstat_drop_entry_internal(PgStatShared_HashEntry *shent, /* release refcount marking entry as not dropped */ if (pg_atomic_sub_fetch_u32(&shent->refcount, 1) == 0) { + elog(WARNING, "XXX key %u/%u/%llu pgstat_drop_entry_internal refcount=0", + key.kind, key.dboid, (unsigned long long) key.objid); pgstat_free_entry(shent, hstat); return true; } else { + elog(WARNING, "XXX key %u/%u/%llu pgstat_drop_entry_internal refcount>0", + key.kind, key.dboid, (unsigned long long) key.objid); if (!hstat) dshash_release_lock(pgStatLocal.shared_hash, shent); return false; @@ -963,6 +997,9 @@ pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid) key.dboid = dboid; key.objid = objid; + elog(WARNING, "XXX key %u/%u/%llu pgstat_drop_entry", + key.kind, key.dboid, (unsigned long long) key.objid); + /* delete local reference */ if (pgStatEntryRefHash) { -- 2.45.2
signature.asc
Description: PGP signature