On Fri, Jun 26, 2026 at 08:37:07PM +0100, Dean Rasheed wrote:
> Thanks. Any reviews will be very helpful.

I drove Claude through a review.  You can too, so there's nothing
special to it, but I'll go over it here, with a bit of my content added,
and hopefully y'all don't hate me for sharing LLM outputs.

1. You have an uninitialized have_gtrs in
   src/backend/commands/vacuum.c:504.

2. TRUNCATE on a GTT can cause corruption.

   Claude's description (I don't know how to evaluate this one myself):

   RelationSetNewRelfilenumber writes session-local freeze xids into
   shared pg_class for GTTs (relcache.c:4057-4076). Only relfilenode is
   split via SetEffective_relfilenode. relpages=0, reltuples=-1,
   relallvisible=0, relallfrozen=0, relfrozenxid=freezeXid,
   relminmxid=minmulti, relpersistence are written directly into
   classform and committed via CatalogTupleUpdate(pg_class, ...). Called
   for GTT TRUNCATE (tablecmds.c:2301) and REINDEX (index.c:3908). For a
   GTT, the new freezeXid reflects this session's freshly created
   storage; clobbering shared pg_class.relfrozenxid with it can advance
   the global horizon past what other sessions' local storage actually
   has.

   Exploring further Claude added:

   The failure chain:

   1. t=0: Session B opens gtt_foo, MyProc->tempfrozenxid = X0 (~old
      RecentXmin). B never inserts anything; its local file has no rows
      but is still bound to freezeXid X0.
   2. t=1: Session A opens gtt_foo, inserts rows in xids X0…X1, commits.
      A's local storage holds rows with xmin in that range. B's local
      file is untouched.
   3. t=2: Session A: TRUNCATE gtt_foo;. RelationSetNewRelfilenumber
      runs. table_relation_set_new_filelocator returns freezeXid = X2
      (~current). Code writes pg_class.relfrozenxid = X2 — no cap.
   4. t=3: Any session VACUUMs any tables. vac_update_datfrozenxid walks
      pg_class, sees gtt_foo.relfrozenxid = X2. If other pg_class rows
      all have similar or newer values, newFrozenXid ≈ X2.
   5. t=4: dbform->datfrozenxid = X2. vac_truncate_clog(X2, ...) at
      :2019 truncates CLOG for everything older than X2.
   6. t=5: Session B, whose local gtt_foo file was actually created back
      at t=0 with freezeXid X0, has no problem itself (empty file). But
      suppose C is any other backend also holding local storage with
      xids older than X2 (any GTT, or a running long snapshot's stale
      hint-bit-less pages) — reading such rows now does
      TransactionIdDidCommit(x) → CLOG lookup → "ERROR: could not access
      status of transaction x".
   7. Worse: because the whole cluster's CLOG is truncated, this is not
      just B's GTT. Any tuple anywhere the cluster still holds with an
      xid older than X2 that lacks the HEAP_XMIN_COMMITTED hint bit
      becomes unreadable. TOAST chains, index-only scan hints — all
      vulnerable to torn interpretation.

   Fix: the post-SetEffective_relfilenode block must update
   temp_classform instead of classform when temp_tuple is valid (or skip
   stats/freeze fields on classform entirely for GTT).

3. vac_get_min_tempfrozenxids() scans ProcGlobal->allProcCount without
   ProcArrayLock (vacuum.c:1769-1799).  Every comparable consumer
   (vac_update_datfrozenxid, GetSnapshotData) acquires it.

   Without this lock, a backend can exit mid-scan and a new backend
   reuse the slot; you'll read databaseId/tempfrozenxid from
   inconsistent slot generations.  This can lead to corruption.

4. eoxact_usage_list_len / eoxact_usage_list_overflowed never reset at
   end-of-xact (global_temp.c:1176-1179). Only the storage list counters
   reset.  Once a long-lived session overflows (>32 GTTs touched
   cumulatively), every subsequent xact falls back to full hash_seq
   scans.  Correctness preserved, performance silently degrades.

5. MyProc->tempfrozenxid never decays. Set on first GTT access; only
   moves forward on a session's VACUUM of its own GTT. A backend that
   touches a GTT once and never vacuums holds back pg_class.relfrozenxid
   advancement for all sessions for the connection's lifetime. No path
   clears it on GlobalTempRelationDropped or when local usage_count
   drops to zero.

   I explored this one a bit because I was surprised that the frozenxid
   in the catalog should be relevant to GTTs.  If I understand correctly
   this tempfrozenxid exists because while the GTT storage is
   per-session, there is some coupling via the CLOG, but I don't
   understand why _that_ has to be, and this is what Claude told me:

     An alternative that would have avoided the coordination

     You could have designed it so that GTT pg_class rows are skipped
     from datfrozenxid computation (just like local temp), and each
     backend's tempfrozenxid fed directly into vac_update_datfrozenxid
     as an additional constraint. That would let the shared
     pg_class.relfrozenxid be advisory/stale without a safety issue,
     removing one leg of the coordination.

     The current design chose to keep the shared row accurate — probably
     because it's user-visible (SELECT relfrozenxid FROM pg_class) and
     drives per-relation autovacuum urgency signals. That's the
     tradeoff: an accurate shared row, in exchange for needing
     cross-session horizon exchange. Same fundamental need as local
     temp; different plumbing because the pg_class row is shared.

   And, ah, well, I guess you could say that `relfrozenxid` in
   `pg_class` is not meaningful for GTTs, or that you'll get your
   session's GTT's relfrozenxid.

6. InsertPgTempIndexTuple mutates rel->rd_index->indisvalid in-place
   (global_temp.c:885-892).  Relcache fields are normally immutable
   outside relcache.c.  Works today but fragile.

7. AbortTransaction ordering — gtrs_dropped is list_freed
   unconditionally in AtEOXact (:1178) even though the in-source comment
   (:1077-1080) claims it'll be "repeated next commit". Mostly benign
   (storage is reaped at exit) but the comment is wrong.

8. DeleteRelationTuple calls GlobalTempRelationDropped after
   CatalogTupleDelete; GlobalTempRelationDropped then calls
   get_rel_relkind(relid) (global_temp.c:1038). After a CCI, the
   syscache will return '\0' and skip DeletePgTempIndexTuple. Pass
   relkind explicitly instead of re-deriving it.

Nico
-- 


Reply via email to