Some thoughts: The v1-0003 patch introduced smgropen_cond() to avoid the problem of IssuePendingWritebacks(), which does desynchronised smgropen() calls and could open files after the barrier but just before they are unlinked. Makes sense, but...
1. For that to actually work, we'd better call smgrcloseall() when PROCSIGNAL_BARRIER_SMGRRELEASE is handled; currently it only calls closeAllVds(). Here's a patch for that. But... 2. The reason I used closeAllVds() in 4eb21763 is because I didn't want random CHECK_FOR_INTERRUPTS() calls to break code like this, from RelationCopyStorage(): for (blkno = 0; blkno < nblocks; blkno++) { /* If we got a cancel signal during the copy of the data, quit */ CHECK_FOR_INTERRUPTS(); smgrread(src, forkNum, blkno, buf.data); Perhaps we could change RelationCopyStorage() to take Relation, and use CreateFakeRelCacheEntry() in various places to satisfy that, and also extend that mechanism to work with temporary tables. Here's an initial sketch of that idea, to see what problems it crashes into... Fake Relations are rather unpleasant though; I wonder if there is an SMgrRelationHandle concept that could make this all nicer. There is possibly also some cleanup missing to avoid an SMgrRelation that points to a defunct fake Relation on non-local exit (?).
From 943228b5d4e0dc42ad219c3fe97adb59e7262593 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 1 Apr 2022 17:02:51 +1300 Subject: [PATCH 1/2] WIP: Rethink PROCSIGNAL_BARRIER_SMGRRELEASE. Commit 4eb21763 closed kernel file descriptors, but left SMgrRelations in existence. Unfortunately, IssuePendingWritebacks() can reopen them at any time, including right before the a file is unlinked. In order for IssuePendingWritebacks() to be able to avoid this hazard in a later commit, go further and close the SMgrRelations. XXX We had better make sure that no code does: reln = smgropen(...); CHECK_FOR_INTERRUPTS(); smgrXXX(reln, ...); --- src/backend/storage/smgr/md.c | 6 ------ src/backend/storage/smgr/smgr.c | 10 ++-------- src/include/storage/md.h | 1 - 3 files changed, 2 insertions(+), 15 deletions(-) diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 879f647dbc..d26c915f90 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -549,12 +549,6 @@ mdclose(SMgrRelation reln, ForkNumber forknum) } } -void -mdrelease(void) -{ - closeAllVfds(); -} - /* * mdprefetch() -- Initiate asynchronous read of the specified block of a relation */ diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c index efa83ae394..994728db04 100644 --- a/src/backend/storage/smgr/smgr.c +++ b/src/backend/storage/smgr/smgr.c @@ -41,7 +41,6 @@ typedef struct f_smgr { void (*smgr_init) (void); /* may be NULL */ void (*smgr_shutdown) (void); /* may be NULL */ - void (*smgr_release) (void); /* may be NULL */ void (*smgr_open) (SMgrRelation reln); void (*smgr_close) (SMgrRelation reln, ForkNumber forknum); void (*smgr_create) (SMgrRelation reln, ForkNumber forknum, @@ -70,7 +69,6 @@ static const f_smgr smgrsw[] = { { .smgr_init = mdinit, .smgr_shutdown = NULL, - .smgr_release = mdrelease, .smgr_open = mdopen, .smgr_close = mdclose, .smgr_create = mdcreate, @@ -717,17 +715,13 @@ AtEOXact_SMgr(void) } /* - * This routine is called when we are ordered to release all open files by a + * This routine is called when we are ordered to close all SMgrRelations by a * ProcSignalBarrier. */ bool ProcessBarrierSmgrRelease(void) { - for (int i = 0; i < NSmgr; i++) - { - if (smgrsw[i].smgr_release) - smgrsw[i].smgr_release(); - } + smgrcloseall(); return true; } diff --git a/src/include/storage/md.h b/src/include/storage/md.h index 6e46d8d96a..ffffa40db7 100644 --- a/src/include/storage/md.h +++ b/src/include/storage/md.h @@ -23,7 +23,6 @@ extern void mdinit(void); extern void mdopen(SMgrRelation reln); extern void mdclose(SMgrRelation reln, ForkNumber forknum); -extern void mdrelease(void); extern void mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo); extern bool mdexists(SMgrRelation reln, ForkNumber forknum); extern void mdunlink(RelFileNodeBackend rnode, ForkNumber forknum, bool isRedo); -- 2.35.1
From 7070d98f0287033119f7217cdcfe88d3a2eef937 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 1 Apr 2022 18:35:26 +1300 Subject: [PATCH 2/2] WIP: Handle invalidation in RelationCopyStorage(). Since the loop in RelationCopyStorage() processes interrupts, and since the proposal is that interrupts might invalidate SMgrRelation objects, this function had better deal in Relation objects instead. Various codepaths now need to use CreateFakeRelcacheEntry() to obtain one. That function therefore now needs to support temporary relations too, hence addition of BackendId parameter. XXX it's now extra silly to have that stuff in xlogutils.c XXX maybe what we really need here is some kind of SMgrRelationHandle, instead of fake relations XXX need to review fake relcache entry lifetime issues, in error path --- src/backend/access/heap/heapam.c | 16 +++++++-------- src/backend/access/heap/heapam_handler.c | 13 ++++++------ src/backend/access/transam/xlogutils.c | 9 ++------ src/backend/catalog/storage.c | 26 +++++++++--------------- src/backend/commands/dbcommands.c | 2 +- src/backend/commands/tablecmds.c | 13 ++++++------ src/backend/storage/buffer/bufmgr.c | 4 ++-- src/include/access/xlogutils.h | 2 +- src/include/catalog/storage.h | 2 +- 9 files changed, 39 insertions(+), 48 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 74ad445e59..cd58f04322 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -8739,7 +8739,7 @@ heap_xlog_visible(XLogReaderState *record) */ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); - reln = CreateFakeRelcacheEntry(rnode); + reln = CreateFakeRelcacheEntry(rnode, InvalidBackendId); visibilitymap_pin(reln, blkno, &vmbuffer); /* @@ -8869,7 +8869,7 @@ heap_xlog_delete(XLogReaderState *record) */ if (xlrec->flags & XLH_DELETE_ALL_VISIBLE_CLEARED) { - Relation reln = CreateFakeRelcacheEntry(target_node); + Relation reln = CreateFakeRelcacheEntry(target_node, InvalidBackendId); Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, blkno, &vmbuffer); @@ -8950,7 +8950,7 @@ heap_xlog_insert(XLogReaderState *record) */ if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) { - Relation reln = CreateFakeRelcacheEntry(target_node); + Relation reln = CreateFakeRelcacheEntry(target_node, InvalidBackendId); Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, blkno, &vmbuffer); @@ -9078,7 +9078,7 @@ heap_xlog_multi_insert(XLogReaderState *record) */ if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) { - Relation reln = CreateFakeRelcacheEntry(rnode); + Relation reln = CreateFakeRelcacheEntry(rnode, InvalidBackendId); Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, blkno, &vmbuffer); @@ -9237,7 +9237,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) */ if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED) { - Relation reln = CreateFakeRelcacheEntry(rnode); + Relation reln = CreateFakeRelcacheEntry(rnode, InvalidBackendId); Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, oldblk, &vmbuffer); @@ -9321,7 +9321,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) */ if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED) { - Relation reln = CreateFakeRelcacheEntry(rnode); + Relation reln = CreateFakeRelcacheEntry(rnode, InvalidBackendId); Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, newblk, &vmbuffer); @@ -9517,7 +9517,7 @@ heap_xlog_lock(XLogReaderState *record) Relation reln; XLogRecGetBlockTag(record, 0, &rnode, NULL, &block); - reln = CreateFakeRelcacheEntry(rnode); + reln = CreateFakeRelcacheEntry(rnode, InvalidBackendId); visibilitymap_pin(reln, block, &vmbuffer); visibilitymap_clear(reln, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN); @@ -9590,7 +9590,7 @@ heap_xlog_lock_updated(XLogReaderState *record) Relation reln; XLogRecGetBlockTag(record, 0, &rnode, NULL, &block); - reln = CreateFakeRelcacheEntry(rnode); + reln = CreateFakeRelcacheEntry(rnode, InvalidBackendId); visibilitymap_pin(reln, block, &vmbuffer); visibilitymap_clear(reln, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN); diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index dee264e859..5127728f30 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -28,6 +28,7 @@ #include "access/tableam.h" #include "access/tsmapi.h" #include "access/xact.h" +#include "access/xlogutils.h" #include "catalog/catalog.h" #include "catalog/index.h" #include "catalog/storage.h" @@ -626,9 +627,9 @@ heapam_relation_nontransactional_truncate(Relation rel) static void heapam_relation_copy_data(Relation rel, const RelFileNode *newrnode) { - SMgrRelation dstrel; + Relation dstrel; - dstrel = smgropen(*newrnode, rel->rd_backend); + dstrel = CreateFakeRelcacheEntry(*newrnode, rel->rd_backend); /* * Since we copy the file directly without looking at the shared buffers, @@ -648,7 +649,7 @@ heapam_relation_copy_data(Relation rel, const RelFileNode *newrnode) RelationCreateStorage(*newrnode, rel->rd_rel->relpersistence, true); /* copy main fork */ - RelationCopyStorage(RelationGetSmgr(rel), dstrel, MAIN_FORKNUM, + RelationCopyStorage(rel, dstrel, MAIN_FORKNUM, rel->rd_rel->relpersistence); /* copy those extra forks that exist */ @@ -657,7 +658,7 @@ heapam_relation_copy_data(Relation rel, const RelFileNode *newrnode) { if (smgrexists(RelationGetSmgr(rel), forkNum)) { - smgrcreate(dstrel, forkNum, false); + smgrcreate(RelationGetSmgr(dstrel), forkNum, false); /* * WAL log creation if the relation is persistent, or this is the @@ -667,7 +668,7 @@ heapam_relation_copy_data(Relation rel, const RelFileNode *newrnode) (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && forkNum == INIT_FORKNUM)) log_smgrcreate(newrnode, forkNum); - RelationCopyStorage(RelationGetSmgr(rel), dstrel, forkNum, + RelationCopyStorage(rel, dstrel, forkNum, rel->rd_rel->relpersistence); } } @@ -675,7 +676,7 @@ heapam_relation_copy_data(Relation rel, const RelFileNode *newrnode) /* drop old relation, and close new one */ RelationDropStorage(rel); - smgrclose(dstrel); + FreeFakeRelcacheEntry(dstrel); } static void diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index a4dedc58b7..b59a6ba77a 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -571,7 +571,7 @@ typedef FakeRelCacheEntryData *FakeRelCacheEntry; * Caller must free the returned entry with FreeFakeRelcacheEntry(). */ Relation -CreateFakeRelcacheEntry(RelFileNode rnode) +CreateFakeRelcacheEntry(RelFileNode rnode, BackendId backend) { FakeRelCacheEntry fakeentry; Relation rel; @@ -582,12 +582,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode) rel->rd_rel = &fakeentry->pgc; rel->rd_node = rnode; - - /* - * We will never be working with temp rels during recovery or while - * syncing WAL-skipped files. - */ - rel->rd_backend = InvalidBackendId; + rel->rd_backend = backend; /* It must be a permanent table here */ rel->rd_rel->relpersistence = RELPERSISTENCE_PERMANENT; diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 9898701a43..e2df4aa199 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -440,15 +440,9 @@ RelationPreTruncate(Relation rel) * Note that this requires that there is no dirty data in shared buffers. If * it's possible that there are, callers need to flush those using * e.g. FlushRelationBuffers(rel). - * - * Also note that this is frequently called via locutions such as - * RelationCopyStorage(RelationGetSmgr(rel), ...); - * That's safe only because we perform only smgr and WAL operations here. - * If we invoked anything else, a relcache flush could cause our SMgrRelation - * argument to become a dangling pointer. */ void -RelationCopyStorage(SMgrRelation src, SMgrRelation dst, +RelationCopyStorage(Relation src, Relation dst, ForkNumber forkNum, char relpersistence) { PGAlignedBlock buf; @@ -477,14 +471,14 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, use_wal = XLogIsNeeded() && (relpersistence == RELPERSISTENCE_PERMANENT || copying_initfork); - nblocks = smgrnblocks(src, forkNum); + nblocks = smgrnblocks(RelationGetSmgr(src), forkNum); for (blkno = 0; blkno < nblocks; blkno++) { /* If we got a cancel signal during the copy of the data, quit */ CHECK_FOR_INTERRUPTS(); - smgrread(src, forkNum, blkno, buf.data); + smgrread(RelationGetSmgr(src), forkNum, blkno, buf.data); if (!PageIsVerifiedExtended(page, blkno, PIV_LOG_WARNING | PIV_REPORT_STAT)) @@ -496,8 +490,8 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, * (errcontext callbacks shouldn't be risking any such thing, but * people have been known to forget that rule.) */ - char *relpath = relpathbackend(src->smgr_rnode.node, - src->smgr_rnode.backend, + char *relpath = relpathbackend(src->rd_node, + src->rd_backend, forkNum); ereport(ERROR, @@ -512,7 +506,7 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, * space. */ if (use_wal) - log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page, false); + log_newpage(&dst->rd_node, forkNum, blkno, page, false); PageSetChecksumInplace(page, blkno); @@ -521,7 +515,7 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, * need for smgr to schedule an fsync for this write; we'll do it * ourselves below. */ - smgrextend(dst, forkNum, blkno, buf.data, true); + smgrextend(RelationGetSmgr(dst), forkNum, blkno, buf.data, true); } /* @@ -534,7 +528,7 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, * they might still not be on disk when the crash occurs. */ if (use_wal || copying_initfork) - smgrimmedsync(dst, forkNum); + smgrimmedsync(RelationGetSmgr(dst), forkNum); } /* @@ -832,7 +826,7 @@ smgrDoPendingSyncs(bool isCommit, bool isParallelWorker) * page including any unused space. ReadBufferExtended() * counts some pgstat events; unfortunately, we discard them. */ - rel = CreateFakeRelcacheEntry(srel->smgr_rnode.node); + rel = CreateFakeRelcacheEntry(srel->smgr_rnode.node, InvalidBackendId); log_newpage_range(rel, fork, 0, n, false); FreeFakeRelcacheEntry(rel); } @@ -1019,7 +1013,7 @@ smgr_redo(XLogReaderState *record) } /* Prepare for truncation of FSM and VM too */ - rel = CreateFakeRelcacheEntry(xlrec->rnode); + rel = CreateFakeRelcacheEntry(xlrec->rnode, InvalidBackendId); if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 && smgrexists(reln, FSM_FORKNUM)) diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index df16533901..b05c82718a 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -277,7 +277,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath) * and used the smgr layer directly, we would have to worry about * invalidations. */ - rel = CreateFakeRelcacheEntry(rnode); + rel = CreateFakeRelcacheEntry(rnode, InvalidBackendId); nblocks = smgrnblocks(RelationGetSmgr(rel), MAIN_FORKNUM); FreeFakeRelcacheEntry(rel); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 51b4a00d50..34fab1f15a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -27,6 +27,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "access/xloginsert.h" +#include "access/xlogutils.h" #include "catalog/catalog.h" #include "catalog/heap.h" #include "catalog/index.h" @@ -14607,9 +14608,9 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt) static void index_copy_data(Relation rel, RelFileNode newrnode) { - SMgrRelation dstrel; + Relation dstrel; - dstrel = smgropen(newrnode, rel->rd_backend); + dstrel = CreateFakeRelcacheEntry(newrnode, rel->rd_backend); /* * Since we copy the file directly without looking at the shared buffers, @@ -14629,7 +14630,7 @@ index_copy_data(Relation rel, RelFileNode newrnode) RelationCreateStorage(newrnode, rel->rd_rel->relpersistence, true); /* copy main fork */ - RelationCopyStorage(RelationGetSmgr(rel), dstrel, MAIN_FORKNUM, + RelationCopyStorage(rel, dstrel, MAIN_FORKNUM, rel->rd_rel->relpersistence); /* copy those extra forks that exist */ @@ -14638,7 +14639,7 @@ index_copy_data(Relation rel, RelFileNode newrnode) { if (smgrexists(RelationGetSmgr(rel), forkNum)) { - smgrcreate(dstrel, forkNum, false); + smgrcreate(RelationGetSmgr(dstrel), forkNum, false); /* * WAL log creation if the relation is persistent, or this is the @@ -14648,14 +14649,14 @@ index_copy_data(Relation rel, RelFileNode newrnode) (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && forkNum == INIT_FORKNUM)) log_smgrcreate(&newrnode, forkNum); - RelationCopyStorage(RelationGetSmgr(rel), dstrel, forkNum, + RelationCopyStorage(rel, dstrel, forkNum, rel->rd_rel->relpersistence); } } /* drop old relation, and close new one */ RelationDropStorage(rel); - smgrclose(dstrel); + FreeFakeRelcacheEntry(dstrel); } /* diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dc94d69b22..3931c673ca 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -3792,8 +3792,8 @@ CreateAndCopyRelationData(RelFileNode src_rnode, RelFileNode dst_rnode, * and used the smgr layer directly, we would have to worry about * invalidations. */ - src_rel = CreateFakeRelcacheEntry(src_rnode); - dst_rel = CreateFakeRelcacheEntry(dst_rnode); + src_rel = CreateFakeRelcacheEntry(src_rnode, InvalidBackendId); + dst_rel = CreateFakeRelcacheEntry(dst_rnode, InvalidBackendId); /* * Create and copy all forks of the relation. During create database we diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h index 64708949db..ee13b0bd71 100644 --- a/src/include/access/xlogutils.h +++ b/src/include/access/xlogutils.h @@ -86,7 +86,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record, extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode); -extern Relation CreateFakeRelcacheEntry(RelFileNode rnode); +extern Relation CreateFakeRelcacheEntry(RelFileNode rnode, BackendId backend); extern void FreeFakeRelcacheEntry(Relation fakerel); extern int read_local_xlog_page(XLogReaderState *state, diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h index 844a023b2c..14931ab395 100644 --- a/src/include/catalog/storage.h +++ b/src/include/catalog/storage.h @@ -29,7 +29,7 @@ extern void RelationDropStorage(Relation rel); extern void RelationPreserveStorage(RelFileNode rnode, bool atCommit); extern void RelationPreTruncate(Relation rel); extern void RelationTruncate(Relation rel, BlockNumber nblocks); -extern void RelationCopyStorage(SMgrRelation src, SMgrRelation dst, +extern void RelationCopyStorage(Relation src, Relation dst, ForkNumber forkNum, char relpersistence); extern bool RelFileNodeSkippingWAL(RelFileNode rnode); extern Size EstimatePendingSyncsSpace(void); -- 2.35.1