Re: race condition in pg_class

Heikki Linnakangas Tue, 20 Aug 2024 02:00:05 -0700

On 17/08/2024 07:07, Noah Misch wrote:

On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote:

On 14/07/2024 20:48, Noah Misch wrote:

+ * ... [any slow preparation not requiring oldtup] ...
+ * heap_inplace_update_scan([...], &tup, &inplace_state);
+ * if (!HeapTupleIsValid(tup))
+ *     elog(ERROR, [...]);
+ * ... [buffer is exclusive-locked; mutate "tup"] ...
+ * if (dirty)
+ *     heap_inplace_update_finish(inplace_state, tup);
+ * else
+ *     heap_inplace_update_cancel(inplace_state);


I wonder if the functions should be called "systable_*" and placed in
genam.c rather than in heapam.c. The interface looks more like the existing
systable functions. It feels like a modularity violation for a function in
heapam.c to take an argument like "indexId", and call back into systable_*
functions.


Yes, _scan() and _cancel() especially are wrappers around systable.  Some API
options follow.  Any preference or other ideas?

==== direct s/heap_/systable_/ rename [option 1]

  systable_inplace_update_scan([...], &tup, &inplace_state);
  if (!HeapTupleIsValid(tup))
        elog(ERROR, [...]);
  ... [buffer is exclusive-locked; mutate "tup"] ...
  if (dirty)
        systable_inplace_update_finish(inplace_state, tup);
  else
        systable_inplace_update_cancel(inplace_state);

==== make the first and last steps more systable-like [option 2]

  systable_inplace_update_begin([...], &tup, &inplace_state);
  if (!HeapTupleIsValid(tup))
        elog(ERROR, [...]);
  ... [buffer is exclusive-locked; mutate "tup"] ...
  if (dirty)
        systable_inplace_update(inplace_state, tup);
  systable_inplace_update_end(inplace_state);

==== no systable_ wrapper for middle step, more like CatalogTupleUpdate [option 
3]

  systable_inplace_update_begin([...], &tup, &inplace_state);
  if (!HeapTupleIsValid(tup))
        elog(ERROR, [...]);
  ... [buffer is exclusive-locked; mutate "tup"] ...
  if (dirty)
        heap_inplace_update(relation,
                                                
systable_inplace_old_tuple(inplace_state),
                                                tup,
                                                
systable_inplace_buffer(inplace_state));
  systable_inplace_update_end(inplace_state);


My order of preference is: 2, 1, 3.

Could we just stipulate that you must always hold LOCKTAG_TUPLE when you
call heap_update() on pg_class or pg_database? That'd make the rule simple.


We could.  That would change more code sites.  Rough estimate:

$ git grep -E CatalogTupleUpd'.*(class|relrelation|relationRelation)' | wc -l
23

If the count were 2, I'd say let's simplify the rule like you're exploring.
(I originally had a complicated rule for pg_database, but I abandoned that
when it helped few code sites.)  If it were 100, I'd say the complicated rule
is worth it.  A count of 23 makes both choices fair.

Ok.

How many of those for RELKIND_INDEX vs tables? I'm thinking if we shouldalways require a tuple lock on indexes, if that would make a difference.

Long-term, I hope relfrozenxid gets reimplemented with storage outside
pg_class, removing the need for inplace updates.  So the additional 23 code
sites might change back at a future date.  That shouldn't be a big
consideration, though.

Another option here would be to preface that README section with a simplified
view, something like, "If a warning brought you here, take a tuple lock.  The
rest of this section is just for people needing to understand the conditions
for --enable-casserts emitting that warning."  How about that instead of
simplifying the rules?

Works for me. Or perhaps the rules could just be explained moresuccinctly. Something like:


-----

pg_class heap_inplace_update_scan() callers: before the call, acquire alock on the relation in mode ShareUpdateExclusiveLock or stricter. Ifthe update targets a row of RELKIND_INDEX (but notRELKIND_PARTITIONED_INDEX), that lock must be on the table, locking theindex rel is not necessary. (This allows VACUUM to overwrite per-indexpg_class while holding a lock on the table alone.)heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE inInplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple itoverwrites.

pg_class heap_update() callers: before copying the tuple to modify, takea lock on the tuple, or a ShareUpdateExclusiveLock or stricter on therelation.

SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.Most heap_update() callers already hold a suitable lock on the relationfor other reasons, and can skip the tuple lock. If you do acquire thetuple lock, release it immediately after the update.

pg_database: before copying the tuple to modify, all updaters ofpg_database rows acquire LOCKTAG_TUPLE. (Few updaters acquireLOCKTAG_OBJECT on the database OID, so it wasn't worth extending that asa second option.)

-----

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: race condition in pg_class

Reply via email to