On 29/10/2025 14:30, Joel Jacobson wrote:
On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
On 25/10/2025 21:01, Joel Jacobson wrote:
I also want to share a new idea that Heikki Linnakangas came up with
during PgConf25 in Riga.
The idea is basically to scan the notification queue from tail to head,
and update xids that precedes newFrozenXid, to either
FrozenTransactionId if committed, or InvalidTransactionId if not.
I've implemented this by adding a new extern function
AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
This way, we don't need to hold back the advancement of newFrozenXid in
vac_update_datfrozenxid.
Attempt of implementing Heikki's idea:
- async_notify_freeze_xids.txt
Thanks!
Thanks for the idea!
For the record, Yura suggested the same approach earlier in this thread
[1]. That line of discussion led to more complicated patches but I think
the complications came from trying to set the flag as part of
commit/abort. The changes are more straightforward and backpatchable if
we only do it at vacuum.
This idea has the benefit of never holding back newFrozenXid,
however, I see some problems with it:
- If there is a bug in the code, we could accidentally overwrite
xid values we didn't intend to overwrite, and maybe we would never
find out that we did.
- We wouldn't know for sure that we've understood the cause of
this bug.
With v12-vacuum_notify_queue_cleanup, we instead have the downside of
possibly holding back newFrozenXid, but with the advantage of giving us
higher confidence in that we've correctly identified the cause of the
bug.
Meh. Robustness is good and all, and in heap tuples we don't overwrite
the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
evidence if something goes wrong. But I can't get too worked up about
that for the async notification queue.
Having slept on this for some days, I'm less worried about this
approach. I like the simplicity of it, and that we don't bolt on complexity
to another subsystem, just for the sake of improved debugging
capabilities of a different subsystem.
I think a different way of looking at this, is that we wouldn't conceal
a bug in async, but rather we would let vacuum do part of its job, of
checking the commit status of the xids, when needed, and be okay with
that split responsibility.
Ok, at quick glance I think this patch is in pretty good shape. I'll
review it more thoroughly, and see if I can incorporate the test from
Matheus Alcantara's or Arseniy Mukhin's latest patches, and commit and
backpatch this.
- Heikki