Re: Relation bulk write facility

Noah Misch Mon, 01 Jul 2024 13:53:16 -0700

On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> Committed this. Thanks everyone!


Commit 8af2565 wrote:
> --- /dev/null
> +++ b/src/backend/storage/smgr/bulk_write.c

> +/*
> + * Finish bulk write operation.
> + *
> + * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs
> + * the relation if needed.
> + */
> +void
> +smgr_bulk_finish(BulkWriteState *bulkstate)
> +{
> +     /* WAL-log and flush any remaining pages */
> +     smgr_bulk_flush(bulkstate);
> +
> +     /*
> +      * When we wrote out the pages, we passed skipFsync=true to avoid the
> +      * overhead of registering all the writes with the checkpointer.  
> Register
> +      * the whole relation now.
> +      *
> +      * There is one hole in that idea: If a checkpoint occurred while we 
> were
> +      * writing the pages, it already missed fsyncing the pages we had 
> written
> +      * before the checkpoint started.  A crash later on would replay the WAL
> +      * starting from the checkpoint, therefore it wouldn't replay our 
> earlier
> +      * WAL records.  So if a checkpoint started after the bulk write, fsync
> +      * the files now.
> +      */
> +     if (!SmgrIsTemp(bulkstate->smgr))
> +     {

Shouldn't this be "if (bulkstate->use_wal)"?  The GetRedoRecPtr()-based
decision is irrelevant to the !wal case.  Either we don't need fsync at all
(TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal).  I
don't see any functional problem, but this likely arranges for an unnecessary
sync when a checkpoint starts between mdcreate() and here.  (The mdcreate()
sync may also be unnecessary, but that's longstanding.)

> +             /*
> +              * Prevent a checkpoint from starting between the 
> GetRedoRecPtr() and
> +              * smgrregistersync() calls.
> +              */
> +             Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
> +             MyProc->delayChkptFlags |= DELAY_CHKPT_START;
> +
> +             if (bulkstate->start_RedoRecPtr != GetRedoRecPtr())
> +             {
> +                     /*
> +                      * A checkpoint occurred and it didn't know about our 
> writes, so
> +                      * fsync() the relation ourselves.
> +                      */
> +                     MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> +                     smgrimmedsync(bulkstate->smgr, bulkstate->forknum);
> +                     elog(DEBUG1, "flushed relation because a checkpoint 
> occurred concurrently");
> +             }
> +             else
> +             {
> +                     smgrregistersync(bulkstate->smgr, bulkstate->forknum);
> +                     MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> +             }
> +     }
> +}

This is an elegant optimization.

Re: Relation bulk write facility

Reply via email to