On Sun, Feb 20, 2022 at 3:27 PM Peter Geoghegan <p...@bowt.ie> wrote: > > I think that the idea has potential, but I don't think that I > > understand yet what the *exact* algorithm is. > > The algorithm seems to exploit a natural tendency that Andres once > described in a blog post about his snapshot scalability work [1]. To a > surprising extent, we can usefully bucket all tuples/pages into two > simple categories: > > 1. Very, very old ("infinitely old" for all practical purposes). > > 2. Very very new. > > There doesn't seem to be much need for a third "in-between" category > in practice. This seems to be at least approximately true all of the > time. > > Perhaps Andres wouldn't agree with this very general statement -- he > actually said something more specific. I for one believe that the > point he made generalizes surprisingly well, though. I have my own > theories about why this appears to be true. (Executive summary: power > laws are weird, and it seems as if the sparsity-of-effects principle > makes it easy to bucket things at the highest level, in a way that > generalizes well across disparate workloads.)
I think that this is not really a description of an algorithm -- and I think that it is far from clear that the third "in-between" category does not need to exist. > Remember when I got excited about how my big TPC-C benchmark run > showed a predictable, tick/tock style pattern across VACUUM operations > against the order and order lines table [2]? It seemed very > significant to me that the OldestXmin of VACUUM operation n > consistently went on to become the new relfrozenxid for the same table > in VACUUM operation n + 1. It wasn't exactly the same XID, but very > close to it (within the range of noise). This pattern was clearly > present, even though VACUUM operation n + 1 might happen as long as 4 > or 5 hours after VACUUM operation n (this was a big table). I think findings like this are very unconvincing. TPC-C (or any benchmark really) is so simple as to be a terrible proxy for what vacuuming is going to look like on real-world systems. Like, it's nice that it works, and it shows that something's working, but it doesn't demonstrate that the patch is making the right trade-offs overall. -- Robert Haas EDB: http://www.enterprisedb.com