On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote: > Committed this. Thanks everyone!
Commit 8af2565 wrote: > --- /dev/null > +++ b/src/backend/storage/smgr/bulk_write.c > +/* > + * Finish bulk write operation. > + * > + * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs > + * the relation if needed. > + */ > +void > +smgr_bulk_finish(BulkWriteState *bulkstate) > +{ > + /* WAL-log and flush any remaining pages */ > + smgr_bulk_flush(bulkstate); > + > + /* > + * When we wrote out the pages, we passed skipFsync=true to avoid the > + * overhead of registering all the writes with the checkpointer. > Register > + * the whole relation now. > + * > + * There is one hole in that idea: If a checkpoint occurred while we > were > + * writing the pages, it already missed fsyncing the pages we had > written > + * before the checkpoint started. A crash later on would replay the WAL > + * starting from the checkpoint, therefore it wouldn't replay our > earlier > + * WAL records. So if a checkpoint started after the bulk write, fsync > + * the files now. > + */ > + if (!SmgrIsTemp(bulkstate->smgr)) > + { Shouldn't this be "if (bulkstate->use_wal)"? The GetRedoRecPtr()-based decision is irrelevant to the !wal case. Either we don't need fsync at all (TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal). I don't see any functional problem, but this likely arranges for an unnecessary sync when a checkpoint starts between mdcreate() and here. (The mdcreate() sync may also be unnecessary, but that's longstanding.) > + /* > + * Prevent a checkpoint from starting between the > GetRedoRecPtr() and > + * smgrregistersync() calls. > + */ > + Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0); > + MyProc->delayChkptFlags |= DELAY_CHKPT_START; > + > + if (bulkstate->start_RedoRecPtr != GetRedoRecPtr()) > + { > + /* > + * A checkpoint occurred and it didn't know about our > writes, so > + * fsync() the relation ourselves. > + */ > + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START; > + smgrimmedsync(bulkstate->smgr, bulkstate->forknum); > + elog(DEBUG1, "flushed relation because a checkpoint > occurred concurrently"); > + } > + else > + { > + smgrregistersync(bulkstate->smgr, bulkstate->forknum); > + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START; > + } > + } > +} This is an elegant optimization.