I was afraid of the overhead of the MemoryContextReset between each WAL record since this is a very hot codepath, but it doesn't seem to be noticeable. I used the attached scripts to benchmark it. redobench-setup.sh sets up a base backup and about 5 GB of WAL. The WAL consists of just tiny logical decoding messages, no real page modifications. The idea is that replaying that WAL should make any per-record overhead stand out as much as possible, since there's no real work to do. Use redobench.sh to perform the replay. I am not seeing any measurable difference this patch, so I think we're good. But if we needed to optimize, we could e.g. have an inlined fastpath version of MemoryContextReset for the common case that the context is empty, or only reset it every 100 records or something.
This leaves no built-in rmgrs with any rm_startup or rm_clenaup hooks. Extensions might still use them, and they seem like they might be useful, so I kept them.
There was no natural place to document this, so I added a brief explanation of rm_redo in the RmgrData comment, and then tacked the information about the memory context there too. I also added a note in "Custom WAL Resource Managers" section of the docs to point out that this changed in v18.
(Why am I doing this now? I was browsing through all the global variables for the multithreading work, and these "opCtx"s caught my eye. This is in no way critical for multithreading though.)
-- Heikki Linnakangas Neon (https://neon.tech)
From 9bfc0af26dcfc0c0061b98e170bc6fa02acaf709 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <heikki.linnakan...@iki.fi> Date: Wed, 7 Aug 2024 14:03:08 +0300 Subject: [PATCH v1 1/1] Run WAL redo functions in a temporary memory context Some resource managers already used such a temporary memory context on their own. Let's consolidate that and run all WAL redo functions in a temporary context that is reset between each record. This leaves the rm_startup and rm_cleanup hooks unused in core code, but I kept them because they might still be used by extensions, and it might come handy for builtin rmgrs again in the future too. --- doc/src/sgml/custom-rmgr.sgml | 17 +++++++++++++++++ src/backend/access/gin/ginxlog.c | 21 --------------------- src/backend/access/gist/gistxlog.c | 19 ------------------- src/backend/access/nbtree/nbtxlog.c | 21 --------------------- src/backend/access/spgist/spgxlog.c | 23 ----------------------- src/backend/access/transam/xlogrecovery.c | 12 ++++++++++++ src/include/access/ginxlog.h | 2 -- src/include/access/gistxlog.h | 2 -- src/include/access/nbtxlog.h | 2 -- src/include/access/rmgrlist.h | 8 ++++---- src/include/access/spgxlog.h | 2 -- src/include/access/xlog_internal.h | 5 +++++ 12 files changed, 38 insertions(+), 96 deletions(-) diff --git a/doc/src/sgml/custom-rmgr.sgml b/doc/src/sgml/custom-rmgr.sgml index 3032b2dc0d..bfa15e33d6 100644 --- a/doc/src/sgml/custom-rmgr.sgml +++ b/doc/src/sgml/custom-rmgr.sgml @@ -29,6 +29,11 @@ * This struct must be kept in sync with the PG_RMGR definition in * rmgr.c. * + * rm_redo is the core function that replays one WAL record. During WAL + * recovery, it is called once for every WAL record belonging to the resource + * manager. It's called in a temporary memory context that is reset between + * every record. + * * rm_identify must return a name for the record based on xl_info (without * reference to the rmid). For example, XLOG_BTREE_VACUUM would be named * "VACUUM". rm_desc can then be called to obtain additional detail for the @@ -55,6 +60,18 @@ typedef struct RmgrData </programlisting> </para> + <note> + <para> + Prior to <productname>PostgreSQL</productname> version 18.0, + the <varname>rm_redo</varname> function was called in a long-lived memory + context, and if you needed a short-lived context you had to create one + yourself e.g. in <varname>rm_startup</varname>. Starting with version + 18.0, it's called in a temporary memory context instead, and if you need + to retain data over calls, you need to explicitly allocate them in + <literal>TopMemoryContext</literal>. + </para> + </note> + <para> The <filename>src/test/modules/test_custom_rmgrs</filename> module contains a working example, which demonstrates usage of custom WAL diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c index 07ba0b559e..59afd5e4b4 100644 --- a/src/backend/access/gin/ginxlog.c +++ b/src/backend/access/gin/ginxlog.c @@ -19,8 +19,6 @@ #include "access/xlogutils.h" #include "utils/memutils.h" -static MemoryContext opCtx; /* working memory for operations */ - static void ginRedoClearIncompleteSplit(XLogReaderState *record, uint8 block_id) { @@ -726,7 +724,6 @@ void gin_redo(XLogReaderState *record) { uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; - MemoryContext oldCtx; /* * GIN indexes do not require any conflict processing. NB: If we ever @@ -734,7 +731,6 @@ gin_redo(XLogReaderState *record) * killed tuples outside VACUUM, we'll need to handle that here. */ - oldCtx = MemoryContextSwitchTo(opCtx); switch (info) { case XLOG_GIN_CREATE_PTREE: @@ -767,23 +763,6 @@ gin_redo(XLogReaderState *record) default: elog(PANIC, "gin_redo: unknown op code %u", info); } - MemoryContextSwitchTo(oldCtx); - MemoryContextReset(opCtx); -} - -void -gin_xlog_startup(void) -{ - opCtx = AllocSetContextCreate(CurrentMemoryContext, - "GIN recovery temporary context", - ALLOCSET_DEFAULT_SIZES); -} - -void -gin_xlog_cleanup(void) -{ - MemoryContextDelete(opCtx); - opCtx = NULL; } /* diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c index 451e8d8d98..3127cffc03 100644 --- a/src/backend/access/gist/gistxlog.c +++ b/src/backend/access/gist/gistxlog.c @@ -23,8 +23,6 @@ #include "utils/memutils.h" #include "utils/rel.h" -static MemoryContext opCtx; /* working memory for operations */ - /* * Replay the clearing of F_FOLLOW_RIGHT flag on a child page. * @@ -397,7 +395,6 @@ void gist_redo(XLogReaderState *record) { uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; - MemoryContext oldCxt; /* * GiST indexes do not require any conflict processing. NB: If we ever @@ -405,7 +402,6 @@ gist_redo(XLogReaderState *record) * tuples outside VACUUM, we'll need to handle that here. */ - oldCxt = MemoryContextSwitchTo(opCtx); switch (info) { case XLOG_GIST_PAGE_UPDATE: @@ -429,21 +425,6 @@ gist_redo(XLogReaderState *record) default: elog(PANIC, "gist_redo: unknown op code %u", info); } - - MemoryContextSwitchTo(oldCxt); - MemoryContextReset(opCtx); -} - -void -gist_xlog_startup(void) -{ - opCtx = createTempGistContext(); -} - -void -gist_xlog_cleanup(void) -{ - MemoryContextDelete(opCtx); } /* diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c index b5b0e22447..0073e8bcdb 100644 --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -22,8 +22,6 @@ #include "storage/standby.h" #include "utils/memutils.h" -static MemoryContext opCtx; /* working memory for operations */ - /* * _bt_restore_page -- re-enter all the index tuples on a page * @@ -1014,9 +1012,7 @@ void btree_redo(XLogReaderState *record) { uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; - MemoryContext oldCtx; - oldCtx = MemoryContextSwitchTo(opCtx); switch (info) { case XLOG_BTREE_INSERT_LEAF: @@ -1065,23 +1061,6 @@ btree_redo(XLogReaderState *record) default: elog(PANIC, "btree_redo: unknown op code %u", info); } - MemoryContextSwitchTo(oldCtx); - MemoryContextReset(opCtx); -} - -void -btree_xlog_startup(void) -{ - opCtx = AllocSetContextCreate(CurrentMemoryContext, - "Btree recovery temporary context", - ALLOCSET_DEFAULT_SIZES); -} - -void -btree_xlog_cleanup(void) -{ - MemoryContextDelete(opCtx); - opCtx = NULL; } /* diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c index 4e31d33e70..b422049920 100644 --- a/src/backend/access/spgist/spgxlog.c +++ b/src/backend/access/spgist/spgxlog.c @@ -22,9 +22,6 @@ #include "utils/memutils.h" -static MemoryContext opCtx; /* working memory for operations */ - - /* * Prepare a dummy SpGistState, with just the minimum info needed for replay. * @@ -935,9 +932,7 @@ void spg_redo(XLogReaderState *record) { uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; - MemoryContext oldCxt; - oldCxt = MemoryContextSwitchTo(opCtx); switch (info) { case XLOG_SPGIST_ADD_LEAF: @@ -967,24 +962,6 @@ spg_redo(XLogReaderState *record) default: elog(PANIC, "spg_redo: unknown op code %u", info); } - - MemoryContextSwitchTo(oldCxt); - MemoryContextReset(opCtx); -} - -void -spg_xlog_startup(void) -{ - opCtx = AllocSetContextCreate(CurrentMemoryContext, - "SP-GiST temporary context", - ALLOCSET_DEFAULT_SIZES); -} - -void -spg_xlog_cleanup(void) -{ - MemoryContextDelete(opCtx); - opCtx = NULL; } /* diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index ad817fbca6..b884f84974 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -61,6 +61,7 @@ #include "utils/datetime.h" #include "utils/fmgrprotos.h" #include "utils/guc_hooks.h" +#include "utils/memutils.h" #include "utils/pg_lsn.h" #include "utils/ps_status.h" #include "utils/pg_rusage.h" @@ -1728,6 +1729,8 @@ PerformWalRecovery(void) if (record != NULL) { + MemoryContext oldCtx; + MemoryContext redoCtx; TimestampTz xtime; PGRUsage ru0; @@ -1745,6 +1748,10 @@ PerformWalRecovery(void) if (!StandbyMode) begin_startup_progress_phase(); + redoCtx = AllocSetContextCreate(CurrentMemoryContext, + "recovery temporary context", + ALLOCSET_DEFAULT_SIZES); + /* * main redo apply loop */ @@ -1820,7 +1827,10 @@ PerformWalRecovery(void) /* * Apply the record */ + oldCtx = MemoryContextSwitchTo(redoCtx); ApplyWalRecord(xlogreader, record, &replayTLI); + MemoryContextSwitchTo(oldCtx); + MemoryContextReset(redoCtx); /* Exit loop if we reached inclusive recovery target */ if (recoveryStopsAfter(xlogreader)) @@ -1847,6 +1857,8 @@ PerformWalRecovery(void) * end of main redo apply loop */ + MemoryContextSwitchTo(oldCtx); + if (reachedRecoveryTarget) { if (!reachedConsistency) diff --git a/src/include/access/ginxlog.h b/src/include/access/ginxlog.h index 0c2ddbd82a..3e5185fbdc 100644 --- a/src/include/access/ginxlog.h +++ b/src/include/access/ginxlog.h @@ -209,8 +209,6 @@ typedef struct ginxlogDeleteListPages extern void gin_redo(XLogReaderState *record); extern void gin_desc(StringInfo buf, XLogReaderState *record); extern const char *gin_identify(uint8 info); -extern void gin_xlog_startup(void); -extern void gin_xlog_cleanup(void); extern void gin_mask(char *pagedata, BlockNumber blkno); #endif /* GINXLOG_H */ diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h index a0bdc5d7d7..809512acd9 100644 --- a/src/include/access/gistxlog.h +++ b/src/include/access/gistxlog.h @@ -110,8 +110,6 @@ typedef struct gistxlogPageReuse extern void gist_redo(XLogReaderState *record); extern void gist_desc(StringInfo buf, XLogReaderState *record); extern const char *gist_identify(uint8 info); -extern void gist_xlog_startup(void); -extern void gist_xlog_cleanup(void); extern void gist_mask(char *pagedata, BlockNumber blkno); #endif diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h index e42ac1611c..3922687858 100644 --- a/src/include/access/nbtxlog.h +++ b/src/include/access/nbtxlog.h @@ -354,8 +354,6 @@ typedef struct xl_btree_newroot * prototypes for functions in nbtxlog.c */ extern void btree_redo(XLogReaderState *record); -extern void btree_xlog_startup(void); -extern void btree_xlog_cleanup(void); extern void btree_mask(char *pagedata, BlockNumber blkno); /* diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h index 78e6b908c6..8bcc82cb54 100644 --- a/src/include/access/rmgrlist.h +++ b/src/include/access/rmgrlist.h @@ -36,12 +36,12 @@ PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL, NULL, standby_decode) PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL, heap_mask, heap2_decode) PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL, heap_mask, heap_decode) -PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, btree_xlog_startup, btree_xlog_cleanup, btree_mask, NULL) +PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, NULL, NULL, btree_mask, NULL) PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, hash_identify, NULL, NULL, hash_mask, NULL) -PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_identify, gin_xlog_startup, gin_xlog_cleanup, gin_mask, NULL) -PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_identify, gist_xlog_startup, gist_xlog_cleanup, gist_mask, NULL) +PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_identify, NULL, NULL, gin_mask, NULL) +PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_identify, NULL, NULL, gist_mask, NULL) PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL, seq_mask, NULL) -PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_identify, spg_xlog_startup, spg_xlog_cleanup, spg_mask, NULL) +PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_identify, NULL, NULL, spg_mask, NULL) PG_RMGR(RM_BRIN_ID, "BRIN", brin_redo, brin_desc, brin_identify, NULL, NULL, brin_mask, NULL) PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_identify, NULL, NULL, NULL, NULL) PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL, NULL) diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h index 16cc735001..fcd768a1c1 100644 --- a/src/include/access/spgxlog.h +++ b/src/include/access/spgxlog.h @@ -252,8 +252,6 @@ typedef struct spgxlogVacuumRedirect extern void spg_redo(XLogReaderState *record); extern void spg_desc(StringInfo buf, XLogReaderState *record); extern const char *spg_identify(uint8 info); -extern void spg_xlog_startup(void); -extern void spg_xlog_cleanup(void); extern void spg_mask(char *pagedata, BlockNumber blkno); #endif /* SPGXLOG_H */ diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index c6a91fb456..4df224bb16 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -335,6 +335,11 @@ struct XLogRecordBuffer; * This struct must be kept in sync with the PG_RMGR definition in * rmgr.c. * + * rm_redo is the core function that replays one WAL record. During WAL + * recovery, it is called once for every WAL record belonging to the resource + * manager. It's called in a temporary memory context that is reset between + * every record. + * * rm_identify must return a name for the record based on xl_info (without * reference to the rmid). For example, XLOG_BTREE_VACUUM would be named * "VACUUM". rm_desc can then be called to obtain additional detail for the -- 2.39.2
redobench.sh
Description: application/shellscript
redobench-setup.sh
Description: application/shellscript