Thanks for committing the new WAL format! On Mon, Mar 25, 2024 at 3:33 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > On 24/03/2024 18:32, Melanie Plageman wrote: > > On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > >> > >> In heap_page_prune_and_freeze(), we now do some extra work on each live > >> tuple, to set the all_visible_except_removable correctly. And also to > >> update live_tuples, recently_dead_tuples and hastup. When we're not > >> freezing, that's a waste of cycles, the caller doesn't care. I hope it's > >> enough that it doesn't matter, but is it? > > > > Last year on an early version of the patch set I did some pgbench > > tpcb-like benchmarks -- since there is a lot of on-access pruning in > > that workload -- and I don't remember it being a showstopper. The code > > has changed a fair bit since then. However, I think it might be safer > > to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip > > the rest of the loop after heap_prune_satisifies_vacuum() when > > on-access pruning invokes it. I had avoided that because it felt ugly > > and error-prone, however it addresses a few other of your points as > > well. > > Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the > argument described what it does, rather than who it's for. For example, > 'need_all_visible'. If set to true, the function determines > 'all_visible', otherwise it does not.
I like that way of putting it -- describing what it does instead of who it is for. However, we now have PruneReason as an argument to heap_page_prune(), which would be usable for this purpose (for skipping the rest of the first loop). It is not descriptive of how we would use it in this scenario, though. > I started to look closer at the loops in heap_prune_chain() and how they > update all the various flags and counters. There's a lot going on there. > We have: > > - live_tuples counter > - recently_dead_tuples counter > - all_visible[_except_removable] > - all_frozen > - visibility_cutoff_xid > - hastup > - prstate.frozen array > - nnewlpdead > - deadoffsets array > > And that doesn't even include all the local variables and the final > dead/redirected arrays. Yes, there are a lot of things happening. In an early version, I had hoped for the first loop to be just getting the visibility information and then to do most of the other stuff as we went in heap_prune_chain() as you mention below. I couldn't quite get a version of that working that looked nice. I agree that the whole thing feels a bit brittle and error-prone. It's hard to be objective after fiddling with something over the course of a year. I'm trying to take a step back now and rethink it. > Some of those are set in the first loop that initializes 'htsv' for each > tuple on the page. Others are updated in heap_prune_chain(). Some are > updated in both. It's hard to follow which are set where. Yep. > I think recently_dead_tuples is updated incorrectly, for tuples that are > part of a completely dead HOT chain. For example, imagine a hot chain > with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow > the chain, see the DEAD tuple at the end of the chain, and mark both > tuples for pruning. However, we already updated 'recently_dead_tuples' > in the first loop, which is wrong if we remove the tuple. > > Maybe that's the only bug like this, but I'm a little scared. Is there > something we could do to make this simpler? Maybe move all the new work > that we added to the first loop, into heap_prune_chain() ? Maybe > introduce a few more helper heap_prune_record_*() functions, to update > the flags and counters also for live and insert/delete-in-progress > tuples and for dead line pointers? Something like > heap_prune_record_live() and heap_prune_record_lp_dead(). I had discarded previous attempts to get everything done in heap_prune_chain() because it was hard to make sure I was doing the right thing given that it visits the line pointers out of order so making sure you've considered all of them once and only once was hard. I hadn't thought of the approach you suggested with record_live() -- that might help. I will work on this tomorrow. I had hoped to get something out today, but I am still in the middle of rebasing the back 20 patches from your v5 over current master and then adding in the suggestions that I made in the various diffs on the thread. > > Note that I still don't think we have a resolution on what to > > correctly update new_relfrozenxid and new_relminmxid to at the end > > when presult->nfrozen == 0 and presult->all_frozen is true. > > > > if (presult->nfrozen > 0) > > { > > presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid; > > presult->new_relminmxid = pagefrz->FreezePageRelminMxid; > > } > > else > > { > > presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid; > > presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid; > > } > > One approach is to take them out of the PageFreezeResult struct again, > and pass them as pointers: > > void > heap_page_prune_and_freeze(Relation relation, Buffer buffer, > ... > TransactionId *new_relfrozenxid, > MultiXactId *new_relminmxid, > ... > ) What about the question about whether or not we should be using FreezePageRelfrozenXid when all_frozen is true and nrfrozen == 0. I was confused about whether or not we had to do this by the comment in HeapPageFreeze. > That would be natural for the caller too, as it wouldn't need to set up > the old values to HeapPageFreeze before each call, nor copy the new > values to 'vacrel' after the call. I'm thinking that we'd move the > responsibility of setting up HeapPageFreeze to > heap_page_prune_and_freeze(), instead of having the caller set it up. So > the caller would look something like this: > > heap_page_prune_and_freeze(rel, buf, vacrel->vistest, > &vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid, > &presult, > PRUNE_VACUUM_SCAN, flags, > &vacrel->offnum); > > In this setup, heap_page_prune_and_freeze() would update > *new_relfrozenxid and *new_relminmxid when it has a new value for them, > and leave them unchanged otherwise. I do prefer having heap_page_prune_and_freeze() own HeapPageFreeze. - Melanie