On Fri, Sep 16, 2022 at 12:30 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > After a quick benchmark, I've confirmed that the amount of WAL records > for freezing 1 million tuples reduced to about one-fifth (1.2GB vs > 250MB). Great.
I think that the really interesting thing about the patch is how this changes the way we should think about freezing costs. It makes page-level batching seem very natural. The minimum possible size of a Heap2/FREEZE_PAGE record is 64 bytes, once alignment and so on is taken into account (without the patch). Once we already know that we have to freeze *some* tuples on a given heap page, it becomes very reasonable to freeze as many as possible, in batch, just because we know that it'll be much cheaper if we do it now versus doing it later on instead. Even if this extra freezing ends up "going to waste" due to updates against the same tuples that happen a little later on, the *added* cost of freezing "extra" tuples will have been so small that it's unlikely to matter. On the other hand, if it's not wasted we'll be *much* better off. It's very hard to predict the future, which is kinda what the current FreezeLimit-based approach to freezing does. It's actually quite easy to understand the cost of freezing now versus freezing later, though. At a high level, it makes sense for VACUUM to focus on freezing costs (including the risk that comes with *not* freezing with larger tables), and not worry so much about making accurate predictions. Making accurate predictions about freezing/workload characteristics is overrated. > True. I've not looked at the patch in depth yet but I think we need > regression tests for this. What did you have in mind? I think that the best way to test something like this is with wal_consistency_checking. That mostly works fine. However, note that heap_mask() won't always be able to preserve the state of a tuple's xmax when modified by freezing. We sometimes need "hint bits" to actually reliably be set in REDO, when replaying the records for freezing. At other times they really are just hints. We have to conservatively assume that it's just a hint when masking. Not sure if I can do much about that. Note that this optimization is one level below lazy_scan_prune(), and one level above heap_execute_freeze_tuple(). Neither function really changes at all. This seems useful because there are rare pg_upgrade-only paths where xvac fields need to be frozen. That's not tested either. -- Peter Geoghegan