Hi,
Scenario is a very plain upsert: CREATE TABLE upsert(key int primary key); INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key; INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key; INSERT 0 1 INSERT 0 1 postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, pg_relation_size('upsert'::regclass::text) / 8192 - 1) blkno, get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, heap_page_items(raw_page) as page_items; ┌────┬────────┬──────────┬────────┬──────────┬──────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬────────┬────────────┐ │ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid │ t_data │ ├────┼────────┼──────────┼────────┼──────────┼──────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼────────────┤ │ 1 │ 8160 │ 1 │ 28 │ 19742591 │ 19742592 │ 0 │ (0,2) │ 24577 │ 256 │ 24 │ (null) │ (null) │ \x01000000 │ │ 2 │ 8128 │ 1 │ 28 │ 19742592 │ 19742592 │ 0 │ (0,2) │ 32769 │ 8336 │ 24 │ (null) │ (null) │ \x01000000 │ └────┴────────┴──────────┴────────┴──────────┴──────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴────────┴────────────┘ (2 rows) as you can see the same xmax is set for both row version, with the new infomask being HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED. The reason that happens is that ExecOnConflictUpdate() needs to lock the row to be able to reliably compute a new tuple version based on that row. heap_update() then decides it needs to carry that xmax forward, as it's a valid lock: /* * If the tuple we're updating is locked, we need to preserve the locking * info in the old tuple's Xmax. Prepare a new Xmax value for this. */ compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data), oldtup.t_data->t_infomask, oldtup.t_data->t_infomask2, xid, *lockmode, true, &xmax_old_tuple, &infomask_old_tuple, &infomask2_old_tuple); /* * And also prepare an Xmax value for the new copy of the tuple. If there * was no xmax previously, or there was one but all lockers are now gone, * then use InvalidXid; otherwise, get the xmax from the old tuple. (In * rare cases that might also be InvalidXid and yet not have the * HEAP_XMAX_INVALID bit set; that's fine.) */ if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) || HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) || (checked_lockers && !locker_remains)) xmax_new_tuple = InvalidTransactionId; else xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data); but we really don't need to do any of that in this case - the only locker is the current backend, after all. I think this isn't great, because it'll later will cause unnecessary hint bit writes (although ones somewhat likely combined with setting XMIN_COMMITTED), and even extra work for freezing. Based on a quick look this wasn't the case before the finer grained tuple locking - which makes sense, there was no cases where locks would need to be carried forward. It's worthwhile to note that this *nearly* already works, because of the following codepath in compute_new_xmax_infomask(): * If the lock to be acquired is for the same TransactionId as the * existing lock, there's an optimization possible: consider only the * strongest of both locks as the only one present, and restart. */ if (xmax == add_to_xmax) { /* * Note that it's not possible for the original tuple to be updated: * we wouldn't be here because the tuple would have been invisible and * we wouldn't try to update it. As a subtlety, this code can also * run when traversing an update chain to lock future versions of a * tuple. But we wouldn't be here either, because the add_to_xmax * would be different from the original updater. */ Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask)); /* acquire the strongest of both */ if (mode < old_mode) mode = old_mode; /* mustn't touch is_update */ old_infomask |= HEAP_XMAX_INVALID; goto l5; } which set HEAP_XMAX_INVALID for old_infomask, which would then trigger the code above not carrying forward xmax to the new tuple - but compute_new_xmax_infomask() operates on a copy of old_infomask. Note that this contradict comments in heap_update() itself: * If we found a valid Xmax for the new tuple, then the infomask bits * to use on the new tuple depend on what was there on the old one. * Note that since we're doing an update, the only possibility is that * the lockers had FOR KEY SHARE lock. */ which seems to indicate that this behaviour wasn't forseen. I find the separation of concerns (and variable naming) between computations happening in heap_update() itself, and compute_new_xmax_infomask() fairly confusing and redundant. I mean, compute_new_xmax_infomask() expands multis, builds a new one with the updater added. Then re-expands it via GetMultiXactIdHintBits(), to compute infomask bits for the old tuple. Then returns. Just for heap_update() to re-re-expand the multi to compute the infomask bits for the new tuple, again with GetMultiXactIdHintBits()? I know that there's a cache, but still. That's some seriously repetitive work. Oh, and the things GetMultiXactIdHintBits() returns imo aren't really well described with hint bits. Greetings, Andres Freund