Re: Comments on Custom RMGRs

Heikki Linnakangas Tue, 23 Jul 2024 06:22:10 -0700

On 27/05/2024 21:20, Michael Paquier wrote:

On Fri, May 17, 2024 at 04:25:15PM -0400, Robert Haas wrote:

On Fri, May 17, 2024 at 4:20 PM Jeff Davis <pg...@j-davis.com> wrote:

Regarding this particular change: the checkpointing hook seems more
like a table AM feature, so I agree with you that we should have a good
idea how a real table AM might use this, rather than only
pg_stat_statements.


I would even be OK with a pg_stat_statements example that is fully
working and fully explained. I just don't want to have no example at
all. The original proposal has been changed twice because of
complaints that the hook wasn't quite useful enough, but I think that
only proves that v3 is closer to being useful than v1. If v1 is 40% of
the way to useful and v3 is 120% of the way to useful, wonderful! But
if v1 is 20% of the way to being useful and v3 is 60% of the way to
being useful, it's not time to commit anything yet. I don't know which
is the case, and I think if someone wants this to be committed, they
need to explain clearly why it's the first and not the second.


Please note that I've been studying ways to have pg_stat_statements
being plugged in directly with the shared pgstat APIs to get it backed
by a dshash to give more flexibility and scaling, giving a way for
extensions to register their own stats kind.  In this case, the flush
of the stats would be controlled with a callback in the stats
registered by the extensions, conflicting with what's proposed here.
pg_stat_statements is all about stats, at the end.  I don't want this
argument to act as a barrier if a checkpoint hook is an accepted
consensus here,  but a checkpoint hook used for this code path is not
the most intuitive solution I can think of in the long-term.

On the topic of concrete uses for this API: We have a bunch of built-inresource managers that could be refactored to use this API.CheckPointGuts currently looks like this:

/*
 * Flush all data in shared memory to disk, and fsync
 *
 * This is the common code shared between regular checkpoints and
 * recovery restartpoints.
 */
static void
CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
{
        CheckPointRelationMap();
        CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
        CheckPointSnapBuild();
        CheckPointLogicalRewriteHeap();
        CheckPointReplicationOrigin();

        /* Write out all dirty data in SLRUs and the main buffer pool */
        TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
        CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
        CheckPointCLOG();
        CheckPointCommitTs();
        CheckPointSUBTRANS();
        CheckPointMultiXact();
        CheckPointPredicate();

        RmgrCheckpoint(flags, RMGR_CHECKPOINT_BEFORE_BUFFERS);

        CheckPointBuffers(flags);

        RmgrCheckpoint(flags, RMGR_CHECKPOINT_AFTER_BUFFERS);

        /* Perform all queued up fsyncs */
        TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
        CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
        ProcessSyncRequests();
        CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
        TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();

        /* We deliberately delay 2PC checkpointing as long as possible */
        CheckPointTwoPhase(checkPointRedo);
}

Of these calls, CheckPointCLOG would be natural as the rmgr_callback ofthe clog rmgr. Similarly for CheckPointMultiXact and maybe a few others.



However, let's look at the pg_stat_statements patch (pgss_001.v1.patch):

It's now writing a new WAL record for every call to pgss_store(),turning even simple queries into WAL-logged operations. That's anon-starter. It will also not work on a standby. This needs to beredesigned so that the data is updated in memory, and written to diskand/or WAL-logged only periodically. Perhaps at checkpoints, but youcould do it more frequently too.

I'm not convinced that the stats should be WAL-logged. Do you want themto be replicated and included in backups? Maybe, but maybe not. It'scertainly a change to how it currently works.

If we don't WAL-log the stats, we don't really need a custom RMGR forit. We just need a checkpoint hook to flush the stats to disk, but wedon't need a registered RMGR ID for it.

So, I got a feeling that adding this to the rmgr interface is not quiteright. The rmgr callbacks are for things that run when WAL is*replayed*, while checkpoints are related to how WAL is generated. Let'sdesign this as an independent hook, separate from rmgrs.

Another data point: In Neon, we actually had to add a little code tocheckpoints, to WAL-log some exta data. That was a quick hack and mightnot be the right design in the first place, but these hooks would nothave been useful for our purposes. We wanted to write a new WAL recordat shutdown, and in CheckPointGuts(), it's already too late for that. Itneeds to be done earlier, before starting to the shutdown checkpoint.Similar to LogStandbySnapshot(), except that LogStandbySnapshot() is notcalled at shutdown like we wanted to. For a table AM, the point of acheckpoint hook is probably to fsync() data that is managed outside ofthe normal buffer manager and CheckPointGuts() is the right place forthat, but other extensions might want to hook into checkpoints for otherreasons.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Comments on Custom RMGRs

Reply via email to