Hi, On 2022-12-28 15:24:28 -0800, Peter Geoghegan wrote: > I took another look at the code coverage situation around freezing > following pushing the page-level freezing patch earlier today. I > spotted an issue that I'd missed up until now: certain sanity checks > in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more > often than really seems necessary. > > Theoretically this is an old issue that dates back to commit > 699bf7d05c, as opposed to an issue in the page-level freezing patch. > But that's not really true in a real practical sense. In practice the > calls to TransactionIdDidCommit() will happen far more frequently > following today's commit 1de58df4fe (since we're using OldestXmin as > the cutoff that gates performing freeze_xmin/freeze_xmax processing -- > not FreezeLimit).
Hm. But we still only do the check when we actually freeze, rather than just during the pre-check in heap_tuple_should_freeze(). So we'll only incur the increased overhead when we also do more WAL logging etc. Correct? > I took another look at the code coverage situation around freezing > I see no reason why we can't just condition the XID sanity check calls > to TransactionIdDidCommit() (for both the freeze_xmin and the > freeze_xmax callsites) on a cheap tuple hint bit precheck not being > enough. We only need an expensive call to TransactionIdDidCommit() > when the precheck doesn't immediately indicate that the tuple xmin > looks committed when that's what the sanity check expects to see (or > that the tuple's xmax looks aborted, in the case of the callsite where > that's what we expect to see). Hm. I dimply recall that we had repeated cases where the hint bits were set wrongly due to some of the multixact related bugs. I think I was trying to be paranoid about not freezing stuff in those situations, since it can lead to reviving dead tuples, which obviously is bad. > Attached patch shows what I mean. A quick run of the standard > regression tests (with custom instrumentation) shows that this patch > eliminates 100% of all relevant calls to TransactionIdDidCommit(), for > both the freeze_xmin and the freeze_xmax callsites. There's practically no tuple-level concurrent activity in a normal regression test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more interesting to run tests, including isolationtester and pgbench, against a running cluster. Greetings, Andres Freund