On Wed, May 19, 2021 at 06:31:15PM +0900, Michael Paquier wrote: > I still don't understand why XID consistency has anything to do with > the compression of FPIs. There is nothing preventing the testing of > compression of FPIs, and plese note this argument: > https://www.postgresql.org/message-id/bef3b1e0-0b31-4f05-8e0a-f681cb918...@yandex-team.ru > > For example, I can just revert from my tree 0002 and 0003, and still > perform tests of the various compression methods. I do agree that we > are going to need to do something about this problem, but let's drop > this stuff from the set of patches of this thread and just discuss > them where they are needed.
They are needed here - that they're included is deliberate. Revert this and then the tests fail. "Make sure published XIDs are persistent" time make -C src/test/recovery check # Failed test 'new xid after restart is greater' > And you have not replaced BKPIMAGE_IS_COMPRESSED by a PGLZ-equivalent, > so your patch set is eating more bits for BKPIMAGE_* than it needs The goal is to support 2+ "methods" (including "none"), which takes 4 bits, so may as well support 3 methods. - uncompressed - pglz - lz4 - zlib or zstd or ?? This version: 0) repurposes the pre-existing GUC as an enum; 1) saves a bit (until zstd is included); 2) shows the compression in pg_waldump; To support different compression levels, I think I'd change from an enum to string and an assign hook, which sets a pair of ints. -- Justin
>From 07a9ee6809ac9da3627652391bad2a20852f6c06 Mon Sep 17 00:00:00 2001 From: Andrey Borodin <amboro...@acm.org> Date: Sat, 27 Feb 2021 09:03:50 +0500 Subject: [PATCH v8 1/9] Allow alternate compression methods for wal_compression TODO: bump XLOG_PAGE_MAGIC --- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/installation.sgml | 4 +- src/backend/Makefile | 2 +- src/backend/access/transam/xlog.c | 2 +- src/backend/access/transam/xloginsert.c | 65 +++++++++++-- src/backend/access/transam/xlogreader.c | 97 ++++++++++++++++--- src/backend/utils/misc/guc.c | 21 ++-- src/backend/utils/misc/postgresql.conf.sample | 2 +- src/bin/pg_waldump/pg_waldump.c | 13 ++- src/include/access/xlog.h | 2 +- src/include/access/xlog_internal.h | 10 ++ src/include/access/xlogrecord.h | 15 ++- 12 files changed, 194 insertions(+), 48 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 7e32b0686c..218a98cfc6 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3113,23 +3113,26 @@ include_dir 'conf.d' </varlistentry> <varlistentry id="guc-wal-compression" xreflabel="wal_compression"> - <term><varname>wal_compression</varname> (<type>boolean</type>) + <term><varname>wal_compression</varname> (<type>enum</type>) <indexterm> <primary><varname>wal_compression</varname> configuration parameter</primary> </indexterm> </term> <listitem> <para> - When this parameter is <literal>on</literal>, the <productname>PostgreSQL</productname> + This parameter enables compression of WAL using the specified + compression method. + When enabled, the <productname>PostgreSQL</productname> server compresses full page images written to WAL when <xref linkend="guc-full-page-writes"/> is on or during a base backup. A compressed page image will be decompressed during WAL replay. + The supported methods are pglz and zlib. The default value is <literal>off</literal>. Only superusers can change this setting. </para> <para> - Turning this parameter on can reduce the WAL volume without + Enabling compression can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay. diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 3c0aa118c7..073d5089f7 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -147,7 +147,7 @@ su - postgres specify the <option>--without-zlib</option> option to <filename>configure</filename>. Using this option disables support for compressed archives in <application>pg_dump</application> and - <application>pg_restore</application>. + <application>pg_restore</application>, and compressed WAL. </para> </listitem> </itemizedlist> @@ -1236,7 +1236,7 @@ build-postgresql: Prevents use of the <application>Zlib</application> library. This disables support for compressed archives in <application>pg_dump</application> - and <application>pg_restore</application>. + and <application>pg_restore</application> and compressed WAL. </para> </listitem> </varlistentry> diff --git a/src/backend/Makefile b/src/backend/Makefile index 0da848b1fd..3af216ddfc 100644 --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -48,7 +48,7 @@ OBJS = \ LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS) # The backend doesn't need everything that's in LIBS, however -LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS)) +LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS)) ifeq ($(with_systemd),yes) LIBS += -lsystemd diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 441a9124cd..64094e7175 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -98,7 +98,7 @@ char *XLogArchiveCommand = NULL; bool EnableHotStandby = false; bool fullPageWrites = true; bool wal_log_hints = false; -bool wal_compression = false; +int wal_compression = WAL_COMPRESSION_NONE; char *wal_consistency_checking_string = NULL; bool *wal_consistency_checking = NULL; bool wal_init_zero = true; diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 32b4cc84e7..4f81f19c49 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -33,8 +33,18 @@ #include "storage/proc.h" #include "utils/memutils.h" +#ifdef HAVE_LIBZ +#include <zlib.h> +/* zlib compressBound is not a macro */ +#define ZLIB_MAX_BLCKSZ BLCKSZ + (BLCKSZ>>12) + (BLCKSZ>>14) + (BLCKSZ>>25) + 13 +#else +#define ZLIB_MAX_BLCKSZ 0 +#endif + /* Buffer size required to store a compressed version of backup block image */ -#define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ) +#define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ) + +#define COMPRESS_BUFSIZE Max(PGLZ_MAX_BLCKSZ, ZLIB_MAX_BLCKSZ) /* * For each block reference registered with XLogRegisterBuffer, we fill in @@ -58,7 +68,7 @@ typedef struct * backup block data in XLogRecordAssemble() */ /* buffer to store a compressed version of backup block image */ - char compressed_page[PGLZ_MAX_BLCKSZ]; + char compressed_page[COMPRESS_BUFSIZE]; } registered_buffer; static registered_buffer *registered_buffers; @@ -113,7 +123,8 @@ static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info, XLogRecPtr RedoRecPtr, bool doPageWrites, XLogRecPtr *fpw_lsn, int *num_fpi); static bool XLogCompressBackupBlock(char *page, uint16 hole_offset, - uint16 hole_length, char *dest, uint16 *dlen); + uint16 hole_length, char *dest, + uint16 *dlen, WalCompression compression); /* * Begin constructing a WAL record. This must be called before the @@ -628,13 +639,14 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, /* * Try to compress a block image if wal_compression is enabled */ - if (wal_compression) + if (wal_compression != WAL_COMPRESSION_NONE) { is_compressed = XLogCompressBackupBlock(page, bimg.hole_offset, cbimg.hole_length, regbuf->compressed_page, - &compressed_len); + &compressed_len, + wal_compression); } /* @@ -665,8 +677,13 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, if (is_compressed) { + /* The current compression is stored in the WAL record */ + wal_compression_name(wal_compression); /* Range check */ + Assert(wal_compression < (1 << BKPIMAGE_COMPRESS_BITS)); + bimg.length = compressed_len; - bimg.bimg_info |= BKPIMAGE_IS_COMPRESSED; + bimg.bimg_info |= + wal_compression << BKPIMAGE_COMPRESS_OFFSET_BITS; rdt_datas_last->data = regbuf->compressed_page; rdt_datas_last->len = compressed_len; @@ -827,7 +844,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, */ static bool XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length, - char *dest, uint16 *dlen) + char *dest, uint16 *dlen, WalCompression compression) { int32 orig_len = BLCKSZ - hole_length; int32 len; @@ -853,12 +870,42 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length, else source = page; + switch (compression) + { + case WAL_COMPRESSION_PGLZ: + len = pglz_compress(source, orig_len, dest, PGLZ_strategy_default); + break; + +#ifdef HAVE_LIBZ + case WAL_COMPRESSION_ZLIB: + { + unsigned long len_l = COMPRESS_BUFSIZE; + int ret; + ret = compress2((Bytef*)dest, &len_l, (Bytef*)source, orig_len, 1); + if (ret != Z_OK) + len_l = -1; + len = len_l; + break; + } +#endif + + default: + /* + * It should be impossible to get here for unsupported algorithms, + * which cannot be assigned if they're not enabled at compile time. + */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("unknown compression method requested: %d/%s", + compression, wal_compression_name(compression)))); + + } + /* - * We recheck the actual size even if pglz_compress() reports success and + * We recheck the actual size even if compression reports success and * see if the number of bytes saved by compression is larger than the * length of extra data needed for the compressed version of block image. */ - len = pglz_compress(source, orig_len, dest, PGLZ_strategy_default); if (len >= 0 && len + extra_bytes < orig_len) { diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 42738eb940..822b2612cd 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -26,6 +26,7 @@ #include "catalog/pg_control.h" #include "common/pg_lzcompress.h" #include "replication/origin.h" +#include "utils/guc.h" #ifndef FRONTEND #include "miscadmin.h" @@ -33,6 +34,10 @@ #include "utils/memutils.h" #endif +#ifdef HAVE_LIBZ +#include <zlib.h> +#endif + static void report_invalid_record(XLogReaderState *state, const char *fmt,...) pg_attribute_printf(2, 3); static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); @@ -50,6 +55,27 @@ static void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt, /* size of the buffer allocated for error message. */ #define MAX_ERRORMSG_LEN 1000 +/* + * Accept the likely variants for none and pglz, for compatibility with old + * server versions where wal_compression was a boolean. + */ +const struct config_enum_entry wal_compression_options[] = { + {"off", WAL_COMPRESSION_NONE, false}, + {"none", WAL_COMPRESSION_NONE, false}, + {"false", WAL_COMPRESSION_NONE, true}, + {"no", WAL_COMPRESSION_NONE, true}, + {"0", WAL_COMPRESSION_NONE, true}, + {"pglz", WAL_COMPRESSION_PGLZ, false}, + {"true", WAL_COMPRESSION_PGLZ, true}, + {"yes", WAL_COMPRESSION_PGLZ, true}, + {"on", WAL_COMPRESSION_PGLZ, true}, + {"1", WAL_COMPRESSION_PGLZ, true}, +#ifdef HAVE_LIBZ + {"zlib", WAL_COMPRESSION_ZLIB, false}, +#endif + {NULL, 0, false} +}; + /* * Construct a string in state->errormsg_buf explaining what's wrong with * the current record being read. @@ -1290,7 +1316,7 @@ DecodeXLogRecord(XLogReaderState *state, XLogRecord *record, char **errormsg) blk->apply_image = ((blk->bimg_info & BKPIMAGE_APPLY) != 0); - if (blk->bimg_info & BKPIMAGE_IS_COMPRESSED) + if (BKPIMAGE_IS_COMPRESSED(blk->bimg_info)) { if (blk->bimg_info & BKPIMAGE_HAS_HOLE) COPY_HEADER_FIELD(&blk->hole_length, sizeof(uint16)); @@ -1335,29 +1361,28 @@ DecodeXLogRecord(XLogReaderState *state, XLogRecord *record, char **errormsg) } /* - * cross-check that bimg_len < BLCKSZ if the IS_COMPRESSED - * flag is set. + * cross-check that bimg_len < BLCKSZ if it's compressed */ - if ((blk->bimg_info & BKPIMAGE_IS_COMPRESSED) && + if (BKPIMAGE_IS_COMPRESSED(blk->bimg_info) && blk->bimg_len == BLCKSZ) { report_invalid_record(state, - "BKPIMAGE_IS_COMPRESSED set, but block image length %u at %X/%X", + "BKPIMAGE_IS_COMPRESSED, but block image length %u at %X/%X", (unsigned int) blk->bimg_len, LSN_FORMAT_ARGS(state->ReadRecPtr)); goto err; } /* - * cross-check that bimg_len = BLCKSZ if neither HAS_HOLE nor - * IS_COMPRESSED flag is set. + * cross-check that bimg_len = BLCKSZ if neither HAS_HOLE is + * set nor IS_COMPRESSED(). */ if (!(blk->bimg_info & BKPIMAGE_HAS_HOLE) && - !(blk->bimg_info & BKPIMAGE_IS_COMPRESSED) && + !BKPIMAGE_IS_COMPRESSED(blk->bimg_info) && blk->bimg_len != BLCKSZ) { report_invalid_record(state, - "neither BKPIMAGE_HAS_HOLE nor BKPIMAGE_IS_COMPRESSED set, but block image length is %u at %X/%X", + "neither BKPIMAGE_HAS_HOLE nor BKPIMAGE_IS_COMPRESSED, but block image length is %u at %X/%X", (unsigned int) blk->data_len, LSN_FORMAT_ARGS(state->ReadRecPtr)); goto err; @@ -1535,6 +1560,22 @@ XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *len) } } +/* + * Return a statically allocated string associated with the given compression + * method. + */ +const char * +wal_compression_name(WalCompression compression) +{ + for (int i=0; wal_compression_options[i].name != NULL; ++i) + { + if (wal_compression_options[i].val == compression) + return wal_compression_options[i].name; + } + + return "???"; +} + /* * Restore a full-page image from a backup block attached to an XLOG record. * @@ -1555,11 +1596,43 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page) bkpb = &record->blocks[block_id]; ptr = bkpb->bkp_image; - if (bkpb->bimg_info & BKPIMAGE_IS_COMPRESSED) + if (BKPIMAGE_IS_COMPRESSED(bkpb->bimg_info)) { + int compression_method = BKPIMAGE_COMPRESSION(bkpb->bimg_info); /* If a backup block image is compressed, decompress it */ - if (pglz_decompress(ptr, bkpb->bimg_len, tmp.data, - BLCKSZ - bkpb->hole_length, true) < 0) + int32 decomp_result = -1; + switch (compression_method) + { + case WAL_COMPRESSION_PGLZ: + decomp_result = pglz_decompress(ptr, bkpb->bimg_len, tmp.data, + BLCKSZ - bkpb->hole_length, true); + break; + +#ifdef HAVE_LIBZ + case WAL_COMPRESSION_ZLIB: + { + unsigned long decomp_result_l; + decomp_result_l = BLCKSZ - bkpb->hole_length; + if (uncompress((Bytef*)tmp.data, &decomp_result_l, + (Bytef*)ptr, bkpb->bimg_len) == Z_OK) + decomp_result = decomp_result_l; + else + decomp_result = -1; + break; + } +#endif + + default: + report_invalid_record(record, "image at %X/%X is compressed with unsupported codec, block %d (%d/%s)", + (uint32) (record->ReadRecPtr >> 32), + (uint32) record->ReadRecPtr, + block_id, + compression_method, + wal_compression_name(compression_method)); + return false; + } + + if (decomp_result < 0) { report_invalid_record(record, "invalid compressed image at %X/%X, block %d", LSN_FORMAT_ARGS(record->ReadRecPtr), diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index ee731044b6..8860deda2a 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -548,6 +548,7 @@ extern const struct config_enum_entry archive_mode_options[]; extern const struct config_enum_entry recovery_target_action_options[]; extern const struct config_enum_entry sync_method_options[]; extern const struct config_enum_entry dynamic_shared_memory_options[]; +extern const struct config_enum_entry wal_compression_options[]; /* * GUC option variables that are exported from this module @@ -1304,16 +1305,6 @@ static struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, - { - {"wal_compression", PGC_SUSET, WAL_SETTINGS, - gettext_noop("Compresses full-page writes written in WAL file."), - NULL - }, - &wal_compression, - false, - NULL, NULL, NULL - }, - { {"wal_init_zero", PGC_SUSET, WAL_SETTINGS, gettext_noop("Writes zeroes to new WAL files before first use."), @@ -4825,6 +4816,16 @@ static struct config_enum ConfigureNamesEnum[] = NULL, NULL, NULL }, + { + {"wal_compression", PGC_SUSET, WAL_SETTINGS, + gettext_noop("Set the method used to compress full page images in the WAL."), + NULL + }, + &wal_compression, + WAL_COMPRESSION_NONE, wal_compression_options, + NULL, NULL, NULL + }, + { {"dynamic_shared_memory_type", PGC_POSTMASTER, RESOURCES_MEM, gettext_noop("Selects the dynamic shared memory implementation used."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 6e36e4c2ef..3991d35afd 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -218,7 +218,7 @@ #full_page_writes = on # recover from partial page writes #wal_log_hints = off # also do full page writes of non-critical updates # (change requires restart) -#wal_compression = off # enable compression of full-page writes +#wal_compression = off # enable compression of full-page writes: off, pglz, zlib #wal_init_zero = on # zero-fill new WAL files #wal_recycle = on # recycle WAL files #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index f8b8afe4a7..1cd71ac2f7 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -537,18 +537,21 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record) blk); if (XLogRecHasBlockImage(record, block_id)) { - if (record->blocks[block_id].bimg_info & - BKPIMAGE_IS_COMPRESSED) + if (BKPIMAGE_IS_COMPRESSED(record->blocks[block_id].bimg_info)) { + int compression = BKPIMAGE_COMPRESSION( + record->blocks[block_id].bimg_info); + printf(" (FPW%s); hole: offset: %u, length: %u, " - "compression saved: %u", + "compression method %d/%s, saved: %u", XLogRecBlockImageApply(record, block_id) ? "" : " for WAL verification", record->blocks[block_id].hole_offset, record->blocks[block_id].hole_length, + compression, wal_compression_name(compression), BLCKSZ - - record->blocks[block_id].hole_length - - record->blocks[block_id].bimg_len); + record->blocks[block_id].hole_length - + record->blocks[block_id].bimg_len); } else { diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 77187c12be..e8b2c53784 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -116,7 +116,7 @@ extern char *XLogArchiveCommand; extern bool EnableHotStandby; extern bool fullPageWrites; extern bool wal_log_hints; -extern bool wal_compression; +extern int wal_compression; extern bool wal_init_zero; extern bool wal_recycle; extern bool *wal_consistency_checking; diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 26a743b6b6..8b740af66d 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -324,4 +324,14 @@ extern bool InArchiveRecovery; extern bool StandbyMode; extern char *recoveryRestoreCommand; +/* These are the compression IDs written into bimg_info */ +typedef enum WalCompression +{ + WAL_COMPRESSION_NONE, + WAL_COMPRESSION_PGLZ, + WAL_COMPRESSION_ZLIB, +} WalCompression; + +extern const char *wal_compression_name(WalCompression compression); + #endif /* XLOG_INTERNAL_H */ diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h index 80c92a2498..2a60c0fb92 100644 --- a/src/include/access/xlogrecord.h +++ b/src/include/access/xlogrecord.h @@ -114,7 +114,7 @@ typedef struct XLogRecordBlockHeader * present is (BLCKSZ - <length of "hole" bytes>). * * Additionally, when wal_compression is enabled, we will try to compress full - * page images using the PGLZ compression algorithm, after removing the "hole". + * page images, after removing the "hole". * This can reduce the WAL volume, but at some extra cost of CPU spent * on the compression during WAL logging. In this case, since the "hole" * length cannot be calculated by subtracting the number of page image bytes @@ -144,9 +144,18 @@ typedef struct XLogRecordBlockImageHeader /* Information stored in bimg_info */ #define BKPIMAGE_HAS_HOLE 0x01 /* page image has "hole" */ -#define BKPIMAGE_IS_COMPRESSED 0x02 /* page image is compressed */ -#define BKPIMAGE_APPLY 0x04 /* page image should be restored during +#define BKPIMAGE_APPLY 0x02 /* page image should be restored during * replay */ +#define BKPIMAGE_COMPRESS_METHOD1 0x04 /* bits to encode compression method */ +#define BKPIMAGE_COMPRESS_METHOD2 0x08 /* 0=none, 1=pglz, 2=zlib */ + +/* How many bits to shift to extract compression */ +#define BKPIMAGE_COMPRESS_OFFSET_BITS 2 +/* How many bits are for compression */ +#define BKPIMAGE_COMPRESS_BITS 2 +/* Extract the compression from the bimg_info */ +#define BKPIMAGE_COMPRESSION(info) ((info >> BKPIMAGE_COMPRESS_OFFSET_BITS) & ((1<<BKPIMAGE_COMPRESS_BITS) - 1)) +#define BKPIMAGE_IS_COMPRESSED(info) (BKPIMAGE_COMPRESSION(info) != 0) /* * Extra header information used when page image has "hole" and -- 2.17.0
>From 01c937b9a6847a03d0cbb9c857f11419c6fcb89a Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Mon, 8 Mar 2021 15:32:30 +0900 Subject: [PATCH v8 2/9] Run 011_crash_recovery.pl with wal_level=minimal The test doesn't need that feature and pg_current_xact_id() is better exercised by turning off the feature. Copied from: https://www.postgresql.org/message-id/20210308.173242.463790587797836129.horikyota.ntt%40gmail.com --- src/test/recovery/t/011_crash_recovery.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl index a26e99500b..2e7e3db639 100644 --- a/src/test/recovery/t/011_crash_recovery.pl +++ b/src/test/recovery/t/011_crash_recovery.pl @@ -14,7 +14,7 @@ use Config; plan tests => 3; my $node = get_new_node('primary'); -$node->init(allows_streaming => 1); +$node->init(); $node->start; my ($stdin, $stdout, $stderr) = ('', '', ''); -- 2.17.0
>From 9e55bcf6da1d6ffda8d652f5c15060d3b0fa5c35 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Mon, 8 Mar 2021 15:43:01 +0900 Subject: [PATCH v8 3/9] Make sure published XIDs are persistent pg_xact_status() premises that XIDs obtained by pg_current_xact_id(_if_assigned)() are persistent beyond a crash. But XIDs are not guaranteed to go beyond WAL buffers before commit and thus XIDs may vanish if server crashes before commit. This patch guarantees the XID shown by the functions to be flushed out to disk. Copied from: https://www.postgresql.org/message-id/20210308.173242.463790587797836129.horikyota.ntt%40gmail.com --- src/backend/access/transam/xact.c | 55 +++++++++++++++++++++++++------ src/backend/access/transam/xlog.c | 2 +- src/backend/utils/adt/xid8funcs.c | 12 ++++++- src/include/access/xact.h | 3 +- 4 files changed, 59 insertions(+), 13 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 441445927e..da8a460722 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -201,7 +201,7 @@ typedef struct TransactionStateData int prevSecContext; /* previous SecurityRestrictionContext */ bool prevXactReadOnly; /* entry-time xact r/o state */ bool startedInRecovery; /* did we start in recovery? */ - bool didLogXid; /* has xid been included in WAL record? */ + XLogRecPtr minLSN; /* LSN needed to reach to record the xid */ int parallelModeLevel; /* Enter/ExitParallelMode counter */ bool chain; /* start a new block after this one */ bool assigned; /* assigned to top-level XID */ @@ -520,14 +520,46 @@ GetCurrentFullTransactionIdIfAny(void) * MarkCurrentTransactionIdLoggedIfAny * * Remember that the current xid - if it is assigned - now has been wal logged. + * + * upto is the LSN up to which we need to flush WAL to ensure the current xid + * to be persistent. See EnsureCurrentTransactionIdLogged(). */ void -MarkCurrentTransactionIdLoggedIfAny(void) +MarkCurrentTransactionIdLoggedIfAny(XLogRecPtr upto) { - if (FullTransactionIdIsValid(CurrentTransactionState->fullTransactionId)) - CurrentTransactionState->didLogXid = true; + if (FullTransactionIdIsValid(CurrentTransactionState->fullTransactionId) && + XLogRecPtrIsInvalid(CurrentTransactionState->minLSN)) + CurrentTransactionState->minLSN = upto; } +/* + * EnsureCurrentTransactionIdLogged + * + * Make sure that the current top XID is WAL-logged. + */ +void +EnsureTopTransactionIdLogged(void) +{ + /* + * We need at least one WAL record for the current top transaction to be + * flushed out. Write one if we don't have one yet. + */ + if (XLogRecPtrIsInvalid(TopTransactionStateData.minLSN)) + { + xl_xact_assignment xlrec; + + xlrec.xtop = XidFromFullTransactionId(XactTopFullTransactionId); + Assert(TransactionIdIsValid(xlrec.xtop)); + xlrec.nsubxacts = 0; + + XLogBeginInsert(); + XLogRegisterData((char *) &xlrec, MinSizeOfXactAssignment); + TopTransactionStateData.minLSN = + XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT); + } + + XLogFlush(TopTransactionStateData.minLSN); +} /* * GetStableLatestTransactionId @@ -616,14 +648,14 @@ AssignTransactionId(TransactionState s) * When wal_level=logical, guarantee that a subtransaction's xid can only * be seen in the WAL stream if its toplevel xid has been logged before. * If necessary we log an xact_assignment record with fewer than - * PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if didLogXid isn't set + * PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if minLSN isn't set * for a transaction even though it appears in a WAL record, we just might * superfluously log something. That can happen when an xid is included * somewhere inside a wal record, but not in XLogRecord->xl_xid, like in * xl_standby_locks. */ if (isSubXact && XLogLogicalInfoActive() && - !TopTransactionStateData.didLogXid) + XLogRecPtrIsInvalid(TopTransactionStateData.minLSN)) log_unknown_top = true; /* @@ -693,6 +725,7 @@ AssignTransactionId(TransactionState s) log_unknown_top) { xl_xact_assignment xlrec; + XLogRecPtr endptr; /* * xtop is always set by now because we recurse up transaction @@ -707,11 +740,13 @@ AssignTransactionId(TransactionState s) XLogRegisterData((char *) unreportedXids, nUnreportedXids * sizeof(TransactionId)); - (void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT); + endptr = XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT); nUnreportedXids = 0; - /* mark top, not current xact as having been logged */ - TopTransactionStateData.didLogXid = true; + + /* set minLSN of top, not of current xact if not yet */ + if (XLogRecPtrIsInvalid(TopTransactionStateData.minLSN)) + TopTransactionStateData.minLSN = endptr; } } } @@ -1996,7 +2031,7 @@ StartTransaction(void) * initialize reported xid accounting */ nUnreportedXids = 0; - s->didLogXid = false; + s->minLSN = InvalidXLogRecPtr; /* * must initialize resource-management stuff first diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 64094e7175..18d8743715 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1162,7 +1162,7 @@ XLogInsertRecord(XLogRecData *rdata, */ WALInsertLockRelease(); - MarkCurrentTransactionIdLoggedIfAny(); + MarkCurrentTransactionIdLoggedIfAny(EndPos); END_CRIT_SECTION(); diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c index cc2b4ac797..992482f8c8 100644 --- a/src/backend/utils/adt/xid8funcs.c +++ b/src/backend/utils/adt/xid8funcs.c @@ -357,6 +357,8 @@ bad_format: Datum pg_current_xact_id(PG_FUNCTION_ARGS) { + FullTransactionId xid; + /* * Must prevent during recovery because if an xid is not assigned we try * to assign one, which would fail. Programs already rely on this function @@ -365,7 +367,12 @@ pg_current_xact_id(PG_FUNCTION_ARGS) */ PreventCommandDuringRecovery("pg_current_xact_id()"); - PG_RETURN_FULLTRANSACTIONID(GetTopFullTransactionId()); + xid = GetTopFullTransactionId(); + + /* the XID is going to be published, make sure it is psersistent */ + EnsureTopTransactionIdLogged(); + + PG_RETURN_FULLTRANSACTIONID(xid); } /* @@ -380,6 +387,9 @@ pg_current_xact_id_if_assigned(PG_FUNCTION_ARGS) if (!FullTransactionIdIsValid(topfxid)) PG_RETURN_NULL(); + /* the XID is going to be published, make sure it is psersistent */ + EnsureTopTransactionIdLogged(); + PG_RETURN_FULLTRANSACTIONID(topfxid); } diff --git a/src/include/access/xact.h b/src/include/access/xact.h index 134f6862da..593a4140df 100644 --- a/src/include/access/xact.h +++ b/src/include/access/xact.h @@ -386,7 +386,8 @@ extern FullTransactionId GetTopFullTransactionId(void); extern FullTransactionId GetTopFullTransactionIdIfAny(void); extern FullTransactionId GetCurrentFullTransactionId(void); extern FullTransactionId GetCurrentFullTransactionIdIfAny(void); -extern void MarkCurrentTransactionIdLoggedIfAny(void); +extern void MarkCurrentTransactionIdLoggedIfAny(XLogRecPtr upto); +extern void EnsureTopTransactionIdLogged(void); extern bool SubTransactionIsActive(SubTransactionId subxid); extern CommandId GetCurrentCommandId(bool used); extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts); -- 2.17.0
>From d002f2c66fea54e04bc8c75ccf39eb1bf3c0b5a8 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Thu, 11 Mar 2021 17:36:24 -0600 Subject: [PATCH v8 4/9] wal_compression_method: default to zlib.. this is meant to exercise the CIs, and not meant to be merged --- src/backend/access/transam/xlog.c | 2 +- src/backend/utils/misc/guc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 18d8743715..bbac9ac882 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -98,7 +98,7 @@ char *XLogArchiveCommand = NULL; bool EnableHotStandby = false; bool fullPageWrites = true; bool wal_log_hints = false; -int wal_compression = WAL_COMPRESSION_NONE; +int wal_compression = WAL_COMPRESSION_ZLIB; char *wal_consistency_checking_string = NULL; bool *wal_consistency_checking = NULL; bool wal_init_zero = true; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 8860deda2a..f1d73c8b61 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -4822,7 +4822,7 @@ static struct config_enum ConfigureNamesEnum[] = NULL }, &wal_compression, - WAL_COMPRESSION_NONE, wal_compression_options, + WAL_COMPRESSION_ZLIB, wal_compression_options, NULL, NULL, NULL }, -- 2.17.0
>From cfbf78b2fa5f478b4699f881c9e9d6cfb627b976 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Mon, 24 May 2021 23:32:30 -0500 Subject: [PATCH v8 5/9] (re)add wal_compression_method: lz4 --- doc/src/sgml/config.sgml | 3 ++- doc/src/sgml/install-windows.sgml | 2 +- doc/src/sgml/installation.sgml | 5 +++-- src/backend/access/transam/xloginsert.c | 17 ++++++++++++++++- src/backend/access/transam/xlogreader.c | 14 ++++++++++++++ src/backend/utils/misc/postgresql.conf.sample | 2 +- src/include/access/xlog_internal.h | 1 + src/include/access/xlogrecord.h | 2 +- 8 files changed, 39 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 218a98cfc6..352b31fa81 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3126,7 +3126,8 @@ include_dir 'conf.d' server compresses full page images written to WAL when <xref linkend="guc-full-page-writes"/> is on or during a base backup. A compressed page image will be decompressed during WAL replay. - The supported methods are pglz and zlib. + The supported methods are pglz, zlib, and (if configured when + <productname>PostgreSQL</productname> was built) lz4. The default value is <literal>off</literal>. Only superusers can change this setting. </para> diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml index db53ee85a8..a023584722 100644 --- a/doc/src/sgml/install-windows.sgml +++ b/doc/src/sgml/install-windows.sgml @@ -299,7 +299,7 @@ $ENV{MSBFLAGS}="/m"; <term><productname>LZ4</productname></term> <listitem><para> Required for supporting <productname>LZ4</productname> compression - method for compressing the table data. Binaries and source can be + method for compressing table or WAL data. Binaries and source can be downloaded from <ulink url="https://github.com/lz4/lz4/releases"></ulink>. </para></listitem> diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 073d5089f7..c7673a4dc8 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -270,7 +270,8 @@ su - postgres <para> You need <productname>LZ4</productname>, if you want to support compression of data with this method; see - <xref linkend="guc-default-toast-compression"/>. + <xref linkend="guc-default-toast-compression"/> and + <xref linkend="guc-wal-compression"/>. </para> </listitem> @@ -980,7 +981,7 @@ build-postgresql: <para> Build with <productname>LZ4</productname> compression support. This allows the use of <productname>LZ4</productname> for - compression of table data. + compression of table and WAL data. </para> </listitem> </varlistentry> diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 4f81f19c49..a8794a941a 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -41,10 +41,17 @@ #define ZLIB_MAX_BLCKSZ 0 #endif +#ifdef USE_LZ4 +#include "lz4.h" +#define LZ4_MAX_BLCKSZ LZ4_COMPRESSBOUND(BLCKSZ) +#else +#define LZ4_MAX_BLCKSZ 0 +#endif + /* Buffer size required to store a compressed version of backup block image */ #define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ) -#define COMPRESS_BUFSIZE Max(PGLZ_MAX_BLCKSZ, ZLIB_MAX_BLCKSZ) +#define COMPRESS_BUFSIZE Max(Max(PGLZ_MAX_BLCKSZ, ZLIB_MAX_BLCKSZ), LZ4_MAX_BLCKSZ) /* * For each block reference registered with XLogRegisterBuffer, we fill in @@ -889,6 +896,14 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length, } #endif +#ifdef USE_LZ4 + case WAL_COMPRESSION_LZ4: + len = LZ4_compress_fast(source, dest, orig_len, COMPRESS_BUFSIZE, 1); + if (len == 0) + len = -1; + break; +#endif + default: /* * It should be impossible to get here for unsupported algorithms, diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 822b2612cd..753b38e58a 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -38,6 +38,10 @@ #include <zlib.h> #endif +#ifdef USE_LZ4 +#include "lz4.h" +#endif + static void report_invalid_record(XLogReaderState *state, const char *fmt,...) pg_attribute_printf(2, 3); static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); @@ -72,6 +76,9 @@ const struct config_enum_entry wal_compression_options[] = { {"1", WAL_COMPRESSION_PGLZ, true}, #ifdef HAVE_LIBZ {"zlib", WAL_COMPRESSION_ZLIB, false}, +#endif +#ifdef USE_LZ4 + {"lz4", WAL_COMPRESSION_LZ4, false}, #endif {NULL, 0, false} }; @@ -1622,6 +1629,13 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page) } #endif +#ifdef USE_LZ4 + case WAL_COMPRESSION_LZ4: + decomp_result = LZ4_decompress_safe(ptr, tmp.data, + bkpb->bimg_len, BLCKSZ-bkpb->hole_length); + break; +#endif + default: report_invalid_record(record, "image at %X/%X is compressed with unsupported codec, block %d (%d/%s)", (uint32) (record->ReadRecPtr >> 32), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 3991d35afd..dbf3155502 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -218,7 +218,7 @@ #full_page_writes = on # recover from partial page writes #wal_log_hints = off # also do full page writes of non-critical updates # (change requires restart) -#wal_compression = off # enable compression of full-page writes: off, pglz, zlib +#wal_compression = off # enable compression of full-page writes: off, pglz, zlib, lz4 #wal_init_zero = on # zero-fill new WAL files #wal_recycle = on # recycle WAL files #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 8b740af66d..0287592cd4 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -330,6 +330,7 @@ typedef enum WalCompression WAL_COMPRESSION_NONE, WAL_COMPRESSION_PGLZ, WAL_COMPRESSION_ZLIB, + WAL_COMPRESSION_LZ4, } WalCompression; extern const char *wal_compression_name(WalCompression compression); diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h index 2a60c0fb92..abb42b364d 100644 --- a/src/include/access/xlogrecord.h +++ b/src/include/access/xlogrecord.h @@ -147,7 +147,7 @@ typedef struct XLogRecordBlockImageHeader #define BKPIMAGE_APPLY 0x02 /* page image should be restored during * replay */ #define BKPIMAGE_COMPRESS_METHOD1 0x04 /* bits to encode compression method */ -#define BKPIMAGE_COMPRESS_METHOD2 0x08 /* 0=none, 1=pglz, 2=zlib */ +#define BKPIMAGE_COMPRESS_METHOD2 0x08 /* 0=none, 1=pglz, 2=zlib, 3=lz4 */ /* How many bits to shift to extract compression */ #define BKPIMAGE_COMPRESS_OFFSET_BITS 2 -- 2.17.0
>From 11ff52f20a3079e60186eab6c749b30ba3f5fe87 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Fri, 12 Mar 2021 15:35:40 -0600 Subject: [PATCH v8 6/9] Default to LZ4.. this is meant to exercise in the CIs, and not meant to be merged --- configure | 6 ++++-- configure.ac | 4 ++-- src/backend/access/transam/xlog.c | 2 +- src/backend/utils/misc/guc.c | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/configure b/configure index e9b98f442f..7038b0727c 100755 --- a/configure +++ b/configure @@ -1575,7 +1575,7 @@ Optional Packages: --with-system-tzdata=DIR use system time zone data in DIR --without-zlib do not use Zlib - --with-lz4 build with LZ4 support + --without-lz4 build without LZ4 support --with-gnu-ld assume the C compiler uses GNU ld [default=no] --with-ssl=LIB use LIB for SSL/TLS support (openssl) --with-openssl obsolete spelling of --with-ssl=openssl @@ -8598,7 +8598,9 @@ $as_echo "#define USE_LZ4 1" >>confdefs.h esac else - with_lz4=no + with_lz4=yes + +$as_echo "#define USE_LZ4 1" >>confdefs.h fi diff --git a/configure.ac b/configure.ac index 3b42d8bdc9..cb0261f179 100644 --- a/configure.ac +++ b/configure.ac @@ -990,8 +990,8 @@ AC_SUBST(with_zlib) # LZ4 # AC_MSG_CHECKING([whether to build with LZ4 support]) -PGAC_ARG_BOOL(with, lz4, no, [build with LZ4 support], - [AC_DEFINE([USE_LZ4], 1, [Define to 1 to build with LZ4 support. (--with-lz4)])]) +PGAC_ARG_BOOL(with, lz4, yes, [build without LZ4 support], + [AC_DEFINE([USE_LZ4], 1, [Define to 1 to build without LZ4 support. (--without-lz4)])]) AC_MSG_RESULT([$with_lz4]) AC_SUBST(with_lz4) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index bbac9ac882..75fbf11229 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -98,7 +98,7 @@ char *XLogArchiveCommand = NULL; bool EnableHotStandby = false; bool fullPageWrites = true; bool wal_log_hints = false; -int wal_compression = WAL_COMPRESSION_ZLIB; +int wal_compression = WAL_COMPRESSION_LZ4; char *wal_consistency_checking_string = NULL; bool *wal_consistency_checking = NULL; bool wal_init_zero = true; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index f1d73c8b61..89ae829719 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -4822,7 +4822,7 @@ static struct config_enum ConfigureNamesEnum[] = NULL }, &wal_compression, - WAL_COMPRESSION_ZLIB, wal_compression_options, + WAL_COMPRESSION_LZ4, wal_compression_options, NULL, NULL, NULL }, -- 2.17.0