Hi, I spent quite a bit more time working on the patch. There's are large changes:
- postmaster/pgstats.c (which is an incorrect location now that it's not a subprocess anymore) is split into utils/activity/pgstat.c and utils/activity/pgstat_kind.c. I don't love the _kind name, but I couldn't come up with anything better. - Implemented a new GUC, stats_fetch_consistency = {none, cache, snapshot}. I think the code overhead of it is pretty ok - most of the handling is entirely generalized. - Most of the "per stats kind" handling is done in pgstat_kind.c. Nearly all the rest is done through an array with per-stats-kind information (extending what was already done with pgstat_sharedentsize etc). - There is no separate "pending stats" hash anymore. If there are pending stats, they are referenced from 'pgStatSharedRefHash' (which used to be the "lookup cache" hash). All the entries with pending stats are in double linked list pgStatPending. - A stat's entry's lwlock, refcount, .. are moved into the dshash entry. There is no need for them to be separate anymore. Also allows to avoid doing some dsa lookups while holding dshash locks. - The dshash entries are not deleted until the refcount has reached 0. That's an important building block to avoid constantly re-creating stats when flushing pending stats for a dropped object. - The reference to the shared entry is established the first time stats for an object are reported. Together with the previous entry that avoids nearly all the avenues for re-creating already dropped stats (see below for the hole). - I addded a bunch of pg_regress style tests, and a larger amount of isolationtester tests. The latter are possibly due to a new pg_stat_force_next_flush() function, avoiding the need to wait until stats are submitted. - 2PC support for "precise" dropping of stats has been added, the collect_oids() based approach removed. - lots of bugfixes, comments, etc... I know of one nontrivial issue that can lead to dropped stats being revived: Within a transaction, a functions can be called even when another transaction that dropped that function has already committed. I added a spec test reproducing the issue: # FIXME: this shows the bug that stats will be revived, because the # shared stats in s2 is only referenced *after* the DROP FUNCTION # committed. That's only possible because there is no locking (and # thus no stats invalidation) around function calls. permutation "s1_track_funcs_all" "s2_track_funcs_none" "s1_func_call" "s2_begin" "s2_func_call" "s1_func_drop" "s2_track_funcs_all" "s2_func_call" "s2_commit" "s2_ff" "s1_func_stats" "s2_func_stats" I think the best fix here would be to associate an xid with the dropped stats object, and only delete the dshash entry once there's no alive with a horizon from before that xid... There's also a second issue (stats for newly created objects surviving the transaction), but that's pretty simple to resolve. Here's all the gory details of my changes happening incrementally: https://github.com/anarazel/postgres/compare/master...shmstat I'll squash and split tomorrow. Too tired for today. I think this is starting to look a lot better than what we have now. But I'm getting less confident that it's realistic to get any of this into PG14, given the state of the release cycle. > I'm impressed that the way you resolved "who should load stats". Using > static shared memory area to hold the point to existing DSA memory > resolves the "first attacher problem". However somewhat doubtful > about the "who should write the stats file", I think it is reasonable > in general. > > But the current place of calling pgstat_write_stats() is a bit to > early. Checkpointer reports some stats *after* calling > ShutdownXLOG(). Perhaps we need to move it after pg_stat_report_*() > calls in HandleCheckpointerInterrupts(). I now moved it into a before_shmem_exit(). I think that should avoid that problem? > https://github.com/anarazel/postgres/commit/03824a236597c87c99d07aa14b9af9d6fe04dd37 > > + * XXX: Why is this a good place to do this? > > Agreed. We don't need to so haste to clean up stats entries. We could > run that in pgstat_reporT_stat()? I've not changed that yet, but I had the same thought. > Agreed that it's better to move database stat entries to fixed > pointers. I actually ended up reverting that. My main motivation for it was that it was problematic that new pending database stats entries could be created at some random place in the hashtable. But with the linked list of pending entries that's not a problem anymore. And I found it nontrivial to manage the refcounts to the shared entry accurately this way. We could still add a cache for the two stats entries though... > > - consider removing PgStatTypes and replacing it with the oid of the > > table the type of stats reside in. So PGSTAT_TYPE_DB would be > > DatabaseRelationId, PGSTAT_TYPE_TABLE would be RelationRelationId, ... > > > > I think that'd make the system more cleanly extensible going forward? > > I'm not sure that works as expected. We already separated repliation > stats from the unified stats hash and pgstat_read/write_statsfile() > needs have the corresponding specific code path. I didn't quite go towards my proposal, but I think I got a lot closer towards not needing much extra code for additional types of stats. I even added an XXX to pgstat_read/write_statsfile() that show how they now could be made generic. > > - the replication slot stuff isn't quite right in my branch > > Ah, yeah. As I mentioned above I think it should be in the unified > stats and should have a special means of shotcut. And the global > stats also should be the same. The problem is that I use indexes for addressing, but that they can change between restarts. I think we can fix that fairly easily, by mapping names to indices once, pgstat_restore_stats(). At the point we call pgstat_restore_stats() StartupReplicationSlots() already was executed, so we can just inquire at that point... Greetings, Andres Freund