At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota....@gmail.com> wrote in > The cause is that the worker had received pending-sync entry correctly > but not never created a relcache entry for the relation using > RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly > set. > > I'm investigating it.
Relcaches are loaded from a file with old content at parallel worker startup. The relcache entry is corrected by invalidation at taking a lock but pending syncs are not considered. Since parallel workers don't access the files so we can just ignore the assertion safely, but I want to rd_firstRelfilenodeSubid flag at invalidation, as attached PoC patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 96ecad02dd..d66e914345 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -1322,10 +1322,27 @@ RelationInitPhysicalAddr(Relation relation) } else { + Oid oldnode = relation->rd_node.relNode; + /* Consult the relation mapper */ relation->rd_node.relNode = RelationMapOidToFilenode(relation->rd_id, relation->rd_rel->relisshared); + + /* + * The leader process might have replaced storage before staring this + * worker. We need to set rd_firstRelfilenodeSubid according to pending + * sync hash in that case. Currently the information is not actually + * used in worker processes, but make things tidy. + */ + if (IsParallelWorker() && oldnode != relation->rd_node.relNode) + { + if (RelFileNodeSkippingWAL(relation->rd_node)) + relation->rd_firstRelfilenodeSubid = TopSubTransactionId; + else + relation->rd_firstRelfilenodeSubid = InvalidTransactionId; + } + if (!OidIsValid(relation->rd_node.relNode)) elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u", RelationGetRelationName(relation), relation->rd_id);