ON CONFLICT (and manual row locks) cause xmax of updated tuple to unnecessarily be set

Andres Freund Wed, 24 Jul 2019 16:25:44 -0700

Hi,


Scenario is a very plain upsert:

CREATE TABLE upsert(key int primary key);
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT 0 1
INSERT 0 1

postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, 
pg_relation_size('upsert'::regclass::text) / 8192 - 1) blkno, 
get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, 
heap_page_items(raw_page) as page_items;
┌────┬────────┬──────────┬────────┬──────────┬──────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬────────┬────────────┐
│ lp │ lp_off │ lp_flags │ lp_len │  t_xmin  │  t_xmax  │ t_field3 │ t_ctid │ 
t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid  │   t_data   │
├────┼────────┼──────────┼────────┼──────────┼──────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼────────────┤
│  1 │   8160 │        1 │     28 │ 19742591 │ 19742592 │        0 │ (0,2)  │   
    24577 │        256 │     24 │ (null) │ (null) │ \x01000000 │
│  2 │   8128 │        1 │     28 │ 19742592 │ 19742592 │        0 │ (0,2)  │   
    32769 │       8336 │     24 │ (null) │ (null) │ \x01000000 │
└────┴────────┴──────────┴────────┴──────────┴──────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴────────┴────────────┘
(2 rows)

as you can see the same xmax is set for both row version, with the new
infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.


The reason that happens is that ExecOnConflictUpdate() needs to lock the
row to be able to reliably compute a new tuple version based on that
row. heap_update() then decides it needs to carry that xmax forward, as
it's a valid lock:


        /*
         * If the tuple we're updating is locked, we need to preserve the 
locking
         * info in the old tuple's Xmax.  Prepare a new Xmax value for this.
         */
        compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
                                                          
oldtup.t_data->t_infomask,
                                                          
oldtup.t_data->t_infomask2,
                                                          xid, *lockmode, true,
                                                          &xmax_old_tuple, 
&infomask_old_tuple,
                                                          &infomask2_old_tuple);

        /*
         * And also prepare an Xmax value for the new copy of the tuple.  If 
there
         * was no xmax previously, or there was one but all lockers are now 
gone,
         * then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
         * rare cases that might also be InvalidXid and yet not have the
         * HEAP_XMAX_INVALID bit set; that's fine.)
         */
        if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
                HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
                (checked_lockers && !locker_remains))
                xmax_new_tuple = InvalidTransactionId;
        else
                xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

but we really don't need to do any of that in this case - the only
locker is the current backend, after all.

I think this isn't great, because it'll later will cause unnecessary
hint bit writes (although ones somewhat likely combined with setting
XMIN_COMMITTED), and even extra work for freezing.

Based on a quick look this wasn't the case before the finer grained
tuple locking - which makes sense, there was no cases where locks would
need to be carried forward.


It's worthwhile to note that this *nearly* already works, because of the
following codepath in compute_new_xmax_infomask():

                 * If the lock to be acquired is for the same TransactionId as 
the
                 * existing lock, there's an optimization possible: consider 
only the
                 * strongest of both locks as the only one present, and restart.
                 */
                if (xmax == add_to_xmax)
                {
                        /*
                         * Note that it's not possible for the original tuple 
to be updated:
                         * we wouldn't be here because the tuple would have 
been invisible and
                         * we wouldn't try to update it.  As a subtlety, this 
code can also
                         * run when traversing an update chain to lock future 
versions of a
                         * tuple.  But we wouldn't be here either, because the 
add_to_xmax
                         * would be different from the original updater.
                         */
                        Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));

                        /* acquire the strongest of both */
                        if (mode < old_mode)
                                mode = old_mode;
                        /* mustn't touch is_update */

                        old_infomask |= HEAP_XMAX_INVALID;
                        goto l5;
                }

which set HEAP_XMAX_INVALID for old_infomask, which would then trigger
the code above not carrying forward xmax to the new tuple - but
compute_new_xmax_infomask() operates on a copy of old_infomask.


Note that this contradict comments in heap_update() itself:

                 * If we found a valid Xmax for the new tuple, then the 
infomask bits
                 * to use on the new tuple depend on what was there on the old 
one.
                 * Note that since we're doing an update, the only possibility 
is that
                 * the lockers had FOR KEY SHARE lock.
                 */

which seems to indicate that this behaviour wasn't forseen.


I find the separation of concerns (and variable naming) between
computations happening in heap_update() itself, and
compute_new_xmax_infomask() fairly confusing and redundant.

I mean, compute_new_xmax_infomask() expands multis, builds a new one
with the updater added. Then re-expands it via GetMultiXactIdHintBits(),
to compute infomask bits for the old tuple. Then returns. Just for
heap_update() to re-re-expand the multi to compute the infomask bits for
the new tuple, again with GetMultiXactIdHintBits()?  I know that there's
a cache, but still. That's some seriously repetitive work.

Oh, and the things GetMultiXactIdHintBits() returns imo aren't really
well described with hint bits.

Greetings,

Andres Freund

ON CONFLICT (and manual row locks) cause xmax of updated tuple to unnecessarily be set

Reply via email to