At Thu, 1 Apr 2021 19:44:25 -0700, Andres Freund <and...@anarazel.de> wrote in > Hi, > > I spent quite a bit more time working on the patch. There's are large > changes: > > - postmaster/pgstats.c (which is an incorrect location now that it's not > a subprocess anymore) is split into utils/activity/pgstat.c and > utils/activity/pgstat_kind.c. I don't love the _kind name, but I > couldn't come up with anything better.
The place was not changed to keep footprint smaller. I agree that the old place is not appropriate. pgstat_kind... How about changin pgstat.c to pgstat_core.c and pgstat_kind.c to pgstat.c? > - Implemented a new GUC, stats_fetch_consistency = {none, cache, > snapshot}. I think the code overhead of it is pretty ok - most of the > handling is entirely generalized. Sounds good. > - Most of the "per stats kind" handling is done in pgstat_kind.c. Nearly > all the rest is done through an array with per-stats-kind information > (extending what was already done with pgstat_sharedentsize etc). > > - There is no separate "pending stats" hash anymore. If there are > pending stats, they are referenced from 'pgStatSharedRefHash' (which > used to be the "lookup cache" hash). All the entries with pending > stats are in double linked list pgStatPending. Sounds reasonable. A bit silimar to TabStatusArray.. Pending stats and shared stats share the same key so they are naturally consolidatable. > - A stat's entry's lwlock, refcount, .. are moved into the dshash > entry. There is no need for them to be separate anymore. Also allows > to avoid doing some dsa lookups while holding dshash locks. > > - The dshash entries are not deleted until the refcount has reached > 0. That's an important building block to avoid constantly re-creating > stats when flushing pending stats for a dropped object. Does that mean the entries for a dropped object is actually dropped by the backend that has been flushd stats of the dropped object at exit? Sounds nice. > - The reference to the shared entry is established the first time stats > for an object are reported. Together with the previous entry that > avoids nearly all the avenues for re-creating already dropped stats > (see below for the hole). > > - I addded a bunch of pg_regress style tests, and a larger amount of > isolationtester tests. The latter are possibly due to a new > pg_stat_force_next_flush() function, avoiding the need to wait until > stats are submitted. > > - 2PC support for "precise" dropping of stats has been added, the > collect_oids() based approach removed. Cool! > - lots of bugfixes, comments, etc... Thanks for all of them. > I know of one nontrivial issue that can lead to dropped stats being > revived: > > Within a transaction, a functions can be called even when another > transaction that dropped that function has already committed. I added a > spec test reproducing the issue: > > # FIXME: this shows the bug that stats will be revived, because the > # shared stats in s2 is only referenced *after* the DROP FUNCTION > # committed. That's only possible because there is no locking (and > # thus no stats invalidation) around function calls. > permutation > "s1_track_funcs_all" "s2_track_funcs_none" > "s1_func_call" "s2_begin" "s2_func_call" "s1_func_drop" > "s2_track_funcs_all" "s2_func_call" "s2_commit" "s2_ff" "s1_func_stats" > "s2_func_stats" > > I think the best fix here would be to associate an xid with the dropped > stats object, and only delete the dshash entry once there's no alive > with a horizon from before that xid... I'm not sure how we do that avoiding a full scan on dshash.. > There's also a second issue (stats for newly created objects surviving > the transaction), but that's pretty simple to resolve. > > Here's all the gory details of my changes happening incrementally: > > https://github.com/anarazel/postgres/compare/master...shmstat > > I'll squash and split tomorrow. Too tired for today. Thank you very much for all of your immense effort. > I think this is starting to look a lot better than what we have now. But > I'm getting less confident that it's realistic to get any of this into > PG14, given the state of the release cycle. > > > > > I'm impressed that the way you resolved "who should load stats". Using > > static shared memory area to hold the point to existing DSA memory > > resolves the "first attacher problem". However somewhat doubtful > > about the "who should write the stats file", I think it is reasonable > > in general. > > > > But the current place of calling pgstat_write_stats() is a bit to > > early. Checkpointer reports some stats *after* calling > > ShutdownXLOG(). Perhaps we need to move it after pg_stat_report_*() > > calls in HandleCheckpointerInterrupts(). > > I now moved it into a before_shmem_exit(). I think that should avoid > that problem? I think so. > > https://github.com/anarazel/postgres/commit/03824a236597c87c99d07aa14b9af9d6fe04dd37 > > > > + * XXX: Why is this a good place to do this? > > > > Agreed. We don't need to so haste to clean up stats entries. We could > > run that in pgstat_reporT_stat()? > > I've not changed that yet, but I had the same thought. > > > > Agreed that it's better to move database stat entries to fixed > > pointers. > > I actually ended up reverting that. My main motivation for it was that > it was problematic that new pending database stats entries could be > created at some random place in the hashtable. But with the linked list > of pending entries that's not a problem anymore. And I found it > nontrivial to manage the refcounts to the shared entry accurately this > way. > > We could still add a cache for the two stats entries though... Yeah. > > > - consider removing PgStatTypes and replacing it with the oid of the > > > table the type of stats reside in. So PGSTAT_TYPE_DB would be > > > DatabaseRelationId, PGSTAT_TYPE_TABLE would be RelationRelationId, ... > > > > > > I think that'd make the system more cleanly extensible going forward? > > > > I'm not sure that works as expected. We already separated repliation > > stats from the unified stats hash and pgstat_read/write_statsfile() > > needs have the corresponding specific code path. > > I didn't quite go towards my proposal, but I think I got a lot closer > towards not needing much extra code for additional types of stats. I > even added an XXX to pgstat_read/write_statsfile() that show how they > now could be made generic. I'll check it. > > > - the replication slot stuff isn't quite right in my branch > > > > Ah, yeah. As I mentioned above I think it should be in the unified > > stats and should have a special means of shotcut. And the global > > stats also should be the same. > > The problem is that I use indexes for addressing, but that they can > change between restarts. I think we can fix that fairly easily, by > mapping names to indices once, pgstat_restore_stats(). At the point we > call pgstat_restore_stats() StartupReplicationSlots() already was > executed, so we can just inquire at that point... Does that mean the saved replslot stats is keyed by their names? regards. -- Kyotaro Horiguchi NTT Open Source Software Center