On 14/07/2024 20:48, Noah Misch wrote:
I've pushed the two patches for your reports.  To placate cfbot, I'm attaching
the remaining patches.

inplace090-LOCKTAG_TUPLE-eoxact-v8.patch: Makes sense. A comment would be in order, it looks pretty random as it is. Something like:

/*
 * Tuple locks are currently only held for short durations within a
 * transaction. Check that we didn't forget to release one.
 */

inplace110-successors-v8.patch: Makes sense.

The README changes would be better as part of the third patch, as this patch doesn't actually do any of the new locking described in the README, and it fixes the "inplace update updates wrong tuple" bug even without those tuple locks.

+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *     elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *     heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *     heap_inplace_update_cancel(inplace_state);

I wonder if the functions should be called "systable_*" and placed in genam.c rather than in heapam.c. The interface looks more like the existing systable functions. It feels like a modularity violation for a function in heapam.c to take an argument like "indexId", and call back into systable_* functions.

        /*----------
         * XXX A crash here can allow datfrozenxid() to get ahead of 
relfrozenxid:
         *
         * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
         * ["R" is a VACUUM tbl]
         * D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
        * c * D: systable_getnext() returns pg_class tuple of tbl
         * R: memcpy() into pg_class tuple of tbl
         * D: raise pg_database.datfrozenxid, XLogInsert(), finish
         * [crash]
         * [recovery restores datfrozenxid w/o relfrozenxid]
         */

Hmm, that's a tight race, but feels bad to leave it unfixed. One approach would be to modify the tuple on the buffer only after WAL-logging it. That way, D cannot read the updated value before it has been WAL logged. Just need to make sure that the change still gets included in the WAL record. Maybe something like:

if (RelationNeedsWAL(relation))
{
    /*
     * Make a temporary copy of the page that includes the change, in
     * case the a full-page image is logged
     */
    PGAlignedBlock tmppage;

    memcpy(tmppage.data, page, BLCKSZ);

    /* copy the tuple to the temporary copy */
    memcpy(...);

    XLogRegisterBlock(0, ..., tmppage, REGBUF_STANDARD);
    XLogInsert();
}

/* copy the tuple to the buffer */
memcpy(...);


  pg_class heap_inplace_update_scan() callers: before the call, acquire
  LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock
  (VACUUM), or a mode with strictly more conflicts.  If the update targets a
  row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be
  on the table.  Locking the index rel is optional.  (This allows VACUUM to
  overwrite per-index pg_class while holding a lock on the table alone.)  We
  could allow weaker locks, in which case the next paragraph would simply call
  for stronger locks for its class of commands.  heap_inplace_update_scan()
  acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for
  ExclusiveLock, on each tuple it overwrites.

  pg_class heap_update() callers: before copying the tuple to modify, take a
  lock that conflicts with at least one of those from the preceding paragraph.
  SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE.
  After heap_update(), release any LOCKTAG_TUPLE.  Most of these callers opt
  to acquire just the LOCKTAG_RELATION.

These rules seem complicated. Phrasing this slightly differently, if I understand correctly: for a heap_update() caller, it's always sufficient to hold LOCKTAG_TUPLE, but if you happen to hold some other lock on the relation that conflicts with those mentioned in the first paragraph, then you can skip the LOCKTAG_TUPLE lock.

Could we just stipulate that you must always hold LOCKTAG_TUPLE when you call heap_update() on pg_class or pg_database? That'd make the rule simple.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to