Re: Assertion with aborted UPDATE in subtransaction

2025-04-05 Thread Jasper Smit
Hi,

Is this assertion something that is worthwhile to fix?

Thanks,
Jasper Smit

On Wed, Mar 26, 2025 at 4:26 PM Jasper Smit  wrote:

> Hi,
>
> My colleague Oleksii Kozlov ran into an assertion while testing aborted
> UPDATE-commands in sub transactions.
> To reproduce this assertion run the SQl in the attached script. I tested
> this on 15.10 and 17.4
>
> Running the script will lead to the the assertion:
> TRAP: failed Assert("HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple)"),
> File:
> "/usr/local/postgresql-17.4/debug-build/../src/backend/access/heap/heapam.c",
> Line: 3766, PID: 15604
>
> After analysis with Luc Vlaming, we believe that the problem is caused by
> a stale multixact member of an aborted subtransaction.
>
> At the time of the assertion, we established that the new tuple does not
> fit on the same page as the old tuple. The
> tuple lock needs to be updated while the page lock is temporarily released.
>
> One line above the assertion, compute_new_xmax_infomask() is called, which
> will in turn call MultiXactIdExpand().
> In MultiXactIdExpand() we determine that the requested txid/status is
> already a member of the current multixact, therefore skipping
> the removal of dead members further below in that function. The multixact
> has in fact an aborted transaction included in it.
> Because the aborted transaction was not removed, later in
> GetMultiXactIdHintBits(), HEAP_XMAX_LOCK_ONLY is not added to the infomask.
> The absence of this bit in the infomask, will eventually lead to the
> assertion.
>
> A possible fix is to change MultiXactIdExpand() to not skip the removal of
> dead members. See the proposed patch attached to this email.
> Another alternative is to remove the assertion, as I think that at
> relevant places the transaction statuses of multixact members get checked.
>
> Regards,
> Jasper Smit
>
>


Assertion with aborted UPDATE in subtransaction

2025-03-26 Thread Jasper Smit
Hi,

My colleague Oleksii Kozlov ran into an assertion while testing aborted
UPDATE-commands in sub transactions.
To reproduce this assertion run the SQl in the attached script. I tested
this on 15.10 and 17.4

Running the script will lead to the the assertion:
TRAP: failed Assert("HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple)"),
File:
"/usr/local/postgresql-17.4/debug-build/../src/backend/access/heap/heapam.c",
Line: 3766, PID: 15604

After analysis with Luc Vlaming, we believe that the problem is caused by a
stale multixact member of an aborted subtransaction.

At the time of the assertion, we established that the new tuple does not
fit on the same page as the old tuple. The
tuple lock needs to be updated while the page lock is temporarily released.

One line above the assertion, compute_new_xmax_infomask() is called, which
will in turn call MultiXactIdExpand().
In MultiXactIdExpand() we determine that the requested txid/status is
already a member of the current multixact, therefore skipping
the removal of dead members further below in that function. The multixact
has in fact an aborted transaction included in it.
Because the aborted transaction was not removed, later in
GetMultiXactIdHintBits(), HEAP_XMAX_LOCK_ONLY is not added to the infomask.
The absence of this bit in the infomask, will eventually lead to the
assertion.

A possible fix is to change MultiXactIdExpand() to not skip the removal of
dead members. See the proposed patch attached to this email.
Another alternative is to remove the assertion, as I think that at relevant
places the transaction statuses of multixact members get checked.

Regards,
Jasper Smit


assert.sql
Description: Binary data


multixact.patch
Description: Binary data