Hi,

On 2018-07-19 12:42:08 -0700, Andres Freund wrote:
> I actually think the balance of all the solutions discussed in this
> thread seem to make neutering pruning *a bit* by far the most palatable
> solution. We don't need to fully prevent removal of such tuple chains,
> it's sufficient that we can detect that a tuple has been removed. A
> large-sledgehammer approach would be to just error out when attempting
> to read such a tuple. The existing error handling logic can relatively
> easily be made to work with that.

So. I'm just back from not working for a few days.  I've not followed
this discussion in all it's detail over the last months.  I've an
annoying bout of allergies.  So I might be entirely off.

I think this whole issue only exists if we actually end up doing catalog
lookups, not if there's only cached lookups (otherwise our invalidation
handling is entirely borked). And we should normally do cached lookups
for a large large percentage of the cases.  Therefore we can make the
cache-miss cases a bit slower.

So what if we, at the begin / end of cache miss handling, re-check if
the to-be-decoded transaction is still in-progress (or has
committed). And we throw an error if that happened. That error is then
caught in reorderbuffer, the in-progress-xact aborted callback is
called, and processing continues (there's a couple nontrivial details
here, but it should be doable).

The biggest issue is what constitutes a "cache miss". It's fairly
trivial to do this for syscache / relcache, but that's not sufficient:
there's plenty cases where catalogs are accessed without going through
either. But as far as I can tell if we declared that all historic
accesses have to go through systable_beginscan* - which'd imo not be a
crazy restriction - we could put the checks at that layer.

That'd require that an index lookup can't crash if the corresponding
heap entry doesn't exist (etc), but that's something we need to handle
anyway.  The issue that multiple separate catalog lookups need to be
coherent (say Robert's pg_class exists, but pg_attribute doesn't
example) is solved by virtue of the the pg_attribute lookups failing if
the transaction aborted.


Am I missing something here?


Greetings,

Andres Freund

Reply via email to