Re: recovering from "found xmin ... from before relfrozenxid ..."

Andres Freund Wed, 28 Oct 2020 21:01:06 -0700

Hi,

On 2020-10-28 19:09:14 -0700, Andres Freund wrote:
> On 2020-10-28 18:13:44 -0700, Andres Freund wrote:
> > Just pushed this. Let's see what the BF says...
> 
> It says that apparently something is unstable about my new test. It
> first passed on a few animals, but then failed a lot in a row. Looking.


The differentiating factor is force_parallel_mode=regress.

Ugh, this is nasty: The problem is that we can end up computing the
horizons the first time before MyDatabaseId is even set. Which leads us
to compute a too aggressive horizon for plain tables, because we skip
over them, as MyDatabaseId still is InvalidOid:

                /*
                 * Normally queries in other databases are ignored for anything 
but
                 * the shared horizon. But in recovery we cannot compute an 
accurate
                 * per-database horizon as all xids are managed via the
                 * KnownAssignedXids machinery.
                 */
                if (in_recovery ||
                        proc->databaseId == MyDatabaseId ||
                        proc->databaseId == 0)  /* always include WalSender */
                        h->data_oldest_nonremovable =
                                TransactionIdOlder(h->data_oldest_nonremovable, 
xmin);

That then subsequently leads us consider a row fully dead in
heap_hot_search_buffers(). Triggering the killtuples logic. Causing the
test to fail.

With force_parallel_mode=regress we constantly start parallel workers,
which makes it much more likely that this case is hit.

It's trivial to fix luckily...

Greetings,

Andres Freund

Re: recovering from "found xmin ... from before relfrozenxid ..."

Reply via email to