On Sun, 8 Jan 2023 at 04:09, Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2023-01-07 16:29:23 -0800, Andres Freund wrote: > > It's probably not too hard to fix specifically in this one place - we could > > just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but > > it strikes me as as a somewhat larger issue for the 64it xid > > infrastructure. I > > suspect this might not be the only place running into problems with such > > "before the universe" xids. > > I haven't found other problematic places in HEAD, but did end up find a less > serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify > that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns > values that look likely to cause problems. Its "just" used in gist luckily. > > It's hard to find places that do this kind of arithmetic, we traditionally > haven't had a helper for it. So it's open-coded in various ways. > [...] > > The currently existing places I found, other than the ones in procarray.c, > luckily don't seem to convert the xids to 64bit xids.
That's good to know. > > For a bit I was thinking that we should introduce the notion that a > > FullTransactionId can be from the past. Specifically that when the upper > > 32bit > > are all set, we treat the lower 32bit as a value from before xid 0 using the > > normal 32bit xid arithmetic. But it sucks to add overhead for that > > everywhere. > > > > It might be a bit more palatable to designate one individual value, > > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far > > before the start of the universe an xid point to... > > On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I > hacked up a patch that converts various fxid functions to inline functions > with such assertions, and it indeed quickly catches the problem this thread > reported, close to the source of the use. Wouldn't it be enough to only fix the constructions in FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does not occur with the patch), and (optionally) bump the first XID available for any cluster to (FirstNormalXid + 1) to retain the 'older than any running transaction' property? The change only fixes the issue for FullTransactionId, which IMO is OK - I don't think we need to keep xid->xid8->xid symmetric in cases of xid8 wraparound. > One issue with that is is that it'd reduce what can be input for the xid8 > type. But it's hard to believe that'd be a real issue? Yes, it's unlikely anyone would ever hit that with our current WAL format - we use 24 bytes /xid just to log it's use, so we'd use at most epoch 0x1000_0000 in unrealistic scenarios. In addition; technically, we already have (3*2^32 - 3) "invalid" xid8 values that can never be produced in FullXidRelativeTo - those few extra invalid values don't matter much to me except "even more special casing". Kind regards, Matthias van de Meent.
v1-0001-Prevent-underflow-of-xid8-epoch.patch
Description: Binary data