On Thu, Aug 25, 2022 at 3:35 PM Jeremy Schneider <schnj...@amazon.com> wrote: > We should be careful here. IIUC, the current autovac behavior helps > bound the "spread" or range of active multixact IDs in the system, which > directly determines the number of distinct pages that contain those > multixacts. If the proposed change herein causes the spread/range of > MXIDs to significantly increase, then it will increase the number of > blocks and increase the probability of thrashing on the SLRUs for these > data structures.
As a general rule VACUUM will tend to do more eager freezing with the patch set compared to HEAD, though it should never do less eager freezing. Not even in corner cases -- never. With the patch, VACUUM pretty much uses the most aggressive possible XID-wise/MXID-wise cutoffs in almost all cases (though only when we actually decide to freeze a page at all, which is now a separate question). The fourth patch in the patch series introduces a very limited exception, where we use the same cutoffs that we'll always use on HEAD (FreezeLimit + MultiXactCutoff) instead of the aggressive variants (OldestXmin and OldestMxact). This isn't just *any* xmax containing a MultiXact: it's a Multi that contains *some* XIDs that *need* to go away during the ongoing VACUUM, and others that *cannot* go away. Oh, and there usually has to be a need to keep two or more XIDs for this to happen -- if there is only one XID then we can usually swap xmax with that XID without any fuss. > PS. see also > https://www.postgresql.org/message-id/247e3ce4-ae81-d6ad-f54d-7d3e0409a...@ardentperf.com I think that the problem you describe here is very real, though I suspect that it needs to be addressed by making opportunistic cleanup of Multis happen more reliably. Running VACUUM more often just isn't practical once a table reaches a certain size. In general, any kind of processing that is time sensitive probably shouldn't be happening solely during VACUUM -- it's just too risky. VACUUM might take a relatively long time to get to the affected page. It might not even be that long in wall clock time or whatever -- just too long to reliably avoid the problem. -- Peter Geoghegan