While reviewing what became commit fe4d022, I was surprised at the sequence of relfilenode values that RelationInitPhysicalAddr() computed for pg_class, during ParallelWorkerMain(), when running the last command of this recipe:
begin; cluster pg_class using pg_class_oid_index; set force_parallel_mode = 'regress'; values (1); There's $OLD_NODE (relfilenode in the committed relation map) and $NEW_NODE (relfilenode in this transaction's active_local_updates). The worker performs RelationInitPhysicalAddr(pg_class) four times: 1) $OLD_NODE in BackgroundWorkerInitializeConnectionByOid(). 2) $OLD_NODE in RelationCacheInvalidate() directly. 3) $OLD_NODE in RelationReloadNailed(), indirectly via RelationCacheInvalidate(). 4) $NEW_NODE indirectly as part of the executor running the query. I did expect $OLD_NODE in (1), since ParallelWorkerMain() calls BackgroundWorkerInitializeConnectionByOid() before StartParallelWorkerTransaction(). I expected $NEW_NODE in (2) and (3); that didn't happen, because ParallelWorkerMain() calls InvalidateSystemCaches() before RestoreRelationMap(). Let's move InvalidateSystemCaches() later. Invalidation should follow any worker initialization step that changes the results of relcache validation; otherwise, we'd need to ensure the InvalidateSystemCaches() will not validate any relcache entry. Invalidation should precede any step that reads from a cache; otherwise, we'd need to redo that step after inval. (Currently, no step reads from a cache.) Many steps, e.g. AttachSerializableXact(), have no effect on relcache validation, so it's arbitrary whether they happen before or after inval. I'm putting inval as late as possible, because I think it's easier to confirm that a step doesn't read from a cache than to confirm that a step doesn't affect relcache validation. An also-reasonable alternative would be to move inval and its prerequisites as early as possible. For reasons described in the attached commit message, this doesn't have user-visible consequences today. Innocent-looking relcache.c changes might upheave that, so I'm proposing this on robustness grounds. No need to back-patch.
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> In parallel workers, invalidate after restoring relmapper. A relcache entry is no fresher than the relmapper data used to build it. The "rd_refcnt <= 1" in RelationReloadNailed() prevented user-visible consequences by leaving every relation !rd_isvalid at the end of this InvalidateSystemCaches(). Any subsequent relation usage first validated the relation, drawing on correct relmapper data. Even so, reduce fragility by delaying invalidation until we initialize all state essential to rebuilding a relcache entry. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index b042696..566ae87 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1417,12 +1417,6 @@ ParallelWorkerMain(Datum main_arg) PushActiveSnapshot(RestoreSnapshot(asnapspace)); /* - * We've changed which tuples we can see, and must therefore invalidate - * system caches. - */ - InvalidateSystemCaches(); - - /* * Restore current role id. Skip verifying whether session user is * allowed to become this role and blindly restore the leader's state for * current role. @@ -1458,6 +1452,12 @@ ParallelWorkerMain(Datum main_arg) AttachSerializableXact(fps->serializable_xact_handle); /* + * We've changed tuple visibility and relmapper state, so discard all + * provisional state derived from system catalogs. + */ + InvalidateSystemCaches(); + + /* * We've initialized all of our state now; nothing should change * hereafter. */