Hi, shortly after shared memory stats went in I had a conversation with Lukas about what it'd enable us going forward. I also chatted with Peter about autovacuum related stats. I started to write an email, but then somehow lost the draft and couldn't bring myself to start from scratch.
Here's a largely unordered list of ideas. I'm not planning to work on them myself, but thought it'd nevertheless be useful to have them memorialized somewhere. 1) Track some statistics based on relfilenodes rather than oids We currently track IO related statistics as part of the normal relation stats. The problem is that that prevents us from collecting stats whenever we operate on a relfilenode, rather than a Relation. We e.g. currently can't track the number of blocks written out in a relation, because we don't have a Relation at that point. Nor can't we really get hold of one, as the writeback can happen in a different database without access to pg_class. Which is also the reason why the per-relation IO stats aren't populated by the startup process, even though it'd obviously sometimes be helpful to know where the most IO time is spent on a standby. There's also quite a bit of contortions of the bufmgr interface related to this. I think the solution to this is actually fairly simple: We split the IO related statistics out from the relation statistics, and track them on a relfilenode basis instead. That'd allow us to track all the IO stats from all the places, rather than the partial job we do right now. 2) Split index and table statistics into different types of stats We track both types of statistics in the same format and rename column in views etc to make them somewhat sensible. A number of the "columns" in index stats are currently unused. If we split the stats for indexes and relations we can have reasonable names for the fields, shrink the current memory usage by halfing the set of fields we keep for indexes, and extend the stats in a more targeted fashion. This e.g. would allow us keep track of the number of index entries killed via the killtuples mechanism, which in turn would allow us to more intelligently decide whether we should vacuum indexes (often the most expensive part of vacuum). In a lot of workload killtuples takes care of most of the cleanup, but in others it doesn't do much. 3) Maintain more historical statistics about vacuuming We currently track the last time a table was vacuumed, the number of times it was vacuumed and a bunch of counters for the number of modified tuples since the last vacuum. However, none of that allows the user to identify which relations are causing autovacuum to not keep up. Even just keeping track of the the total time autovacuum has spent on certain relations would be a significant improvement, with more easily imaginable (total IO [time], autovacuum delay time, xid age). 4) Make the stats mechanism extensible Most of the work towards this has already been done, but a bit more work is necessary. The hardest likely is how to identify stats belonging to an extension across restarts. There's a bunch of extensions with their own stats mechanisms, but it's hard to get this stuff right from the outside. 5) Use extensible shared memory stats to store pg_stat_statements data pg_stat_statements current mechanism has a few issues. The top ones I know of are: - Contention on individual stats entries when the same queryid is executed concurrently. pgstats deals with this by allowing stats to be collected in backend local memory and to be flushed into shared stats at a lower frequency. - The querytext file can get huge (I've seen > 100GB) and cause massive slowdowns. It's better than the old fixed-length, fixed-shared-memory mechansism, don't get me wrong. But we can do better by storing the data in dynamic shared memory and then also support trimming based on the total size. There were some other things, but I can't remember them right now. Greetings, Andres Freund