On Thu, 11 Mar 2021 at 17:31, Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, Mar 9, 2021 at 3:35 PM Peter Geoghegan <p...@bowt.ie> wrote: > > Speaking of line pointer bloat (and "irreversible" bloat), I came > > across something relevant today. I believe that this recent patch from > > Matthias van de Meent is a relatively easy way to improve the > > situation: > > > > https://www.postgresql.org/message-id/flat/CAEze2WjgaQc55Y5f5CQd3L%3DeS5CZcff2Obxp%3DO6pto8-f0hC4w%40mail.gmail.com > > I agree, but all you need is one long-lived tuple toward the end of > the array and you're stuck never being able to truncate it. It seems > like a worthwhile improvement, but whether it actually helps will be > workload-dependant. > > Maybe it'd be OK to allow a much longer array with offsets > some > constant being usable only for HOT. HOT tuples are not indexed, so it > might be easier to rearrange things to allow compaction of the array > if it does happen to get fragmented. But I'm not sure it's OK to > relocate even a HOT tuple to a different TID.
I'm currently trying to work out how to shuffle HOT tuples around as an extension on top of my heap->pd_lower patch, and part of that will be determining when and how HOT tuples are exposed internally. I'm probably going to need to change how they are referenced to get that working (current concept: HOT root TID + transaction identifier for the places that need more than 1 item in HOT chains), but its a very bare-bones prototype currently only generating the data record nescessary to shuffle the item pointers. In that, I've noticed that moving HOT items takes a lot of memory (~ 3 OffsetNumbers per increment of MaxHeapTuplesPerPage, plus some marking bits) to implement it in O(n); which means it would probably warrant its own loop in heap_page_prune seperate from the current mark-and-sweep, triggered based on new measurements included in the current mark-and-sweep of the prune loop. Another idea I'm considering (no real implementation ideas) to add to this extension patch is moving HOT tuples to make space for incoming tuples, to guarantee that non-movable items are placed early on the page. This increases the chances for PageRepairFragmentation to eventually reclaim space from the item pointer array. I have nothing much worth showing yet for these additional patches, though, and all of it might not be worth the additional CPU cycles (it's 'only' 4 bytes per line pointer cleared, so it might be considered too expensive when also taking WAL into account). > Can someone, perhaps > even just the user, still have a reference to the old one and care > about us invalidating it? Maybe. But even if not, I'm not sure this > helps much with the situation you're concerned about, which involves > non-HOT tuples. Users having references to TIDs of HOT tuples should in my opinion be considered unknown behaviour. It might currently work, but the only access to a HOT tuple that is guaranteed to work should be through the chain's root. Breaking the current guarantee of HOT tuples not moving might be worth it if we can get enough savings in storage (which is also becoming more likely if MaxHeapTuplesPerPage is changed to larger values). As to who actually uses / stores these references, I think that the only place they are stored with some expectation of persistence are in sequential heap scans, and that can be changed. With regards, Matthias van de Meent