AtEOXact_Snapshot timing

Robert Haas Thu, 05 Sep 2019 12:00:06 -0700

Way back in 2011, commit 57eb009092684e6e1788dd0dae641ccee1668b10
moved AbortTransaction's AtEOXact_Snapshot call to CleanupTransaction
to fix a problem when a ROLLBACK statement was prepared at the
protocol level and executed in a transaction with REPEATABLE READ or
higher isolation. RevalidateCachedQuery would attempt to obtain a
snapshot and end up failing.  At the time, it was judged that
plancache.c was behaving correctly and this logic was rejiggered to
make that coding pattern safe. However, commit
ac63dca607e8e22247defbc8fe03b6baa3628c42 subsequently taught
RevalidateCachedQuery not to obtain a snapshot for such commands after
all while fixing an unrelated bug, and there now seems to be no case
in which we obtain a snapshot in an aborted transaction.


I'd like to propose that we upgrade that practice to a formal rule.
We've already taken some steps in this direction; notably, commit
42c80c696e9c8323841180029cc62741c21bd356 added an assertion to the
effect that we never perform catcache lookups outside of a valid,
non-aborted transaction. However, right now, if you made the mistake
of trying to access the database through some means other than a
catcache lookup in an aborted transaction, it might appear to work.
Actually, it would be terribly unsafe, because (1) you might've failed
after inserting a heap tuple and before inserting all the
corresponding index tuples and (2) any random subset of the tuples
inserted by prior commands in your transaction might have been pruned
by other backends after you removed your XID from the ProcArray, while
others would remain visible.  In short, such a snapshot is not really
a snapshot at all.

The best way that I've been able to come up with to enforce this rule
after a little bit of thought is to add Assert(IsTransactionState())
to a bunch of functions in snapmgr.c, most notably
GetTransactionSnapshot and GetCatalogSnapshot. The attached patch does
that.  It also makes the comments in RevalidateCachedQuery more
explicit about this issue, and it moves the AtEOXact_Snapshot call
back to AbortTransaction, on the theory (or hope?) that it's better to
dispose of resources sooner, especially resources that might look
superficially valid but really are not.

You may (or may not) wonder why I'm poking at this apparently-obscure
topic.  The answer is "undo."  Without getting into the gory details,
it's better for undo if as much of the cleanup work as possible
happens at AbortTransaction() time and as little as possible is left
until CleanupTransaction().  That seems like a good idea on general
principle too, though, so I'm proposing this as an independent
cleanup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Move-AtEOXact_Snapshot-back-to-AbortTransaction.patch
Description: Binary data

AtEOXact_Snapshot timing

Reply via email to