On Sat, Mar 05, 2022 at 07:26:39PM +0900, Michael Paquier wrote:
> Repeatability and randomness of data counts, we could have for example
> one case with a set of 5~7 int attributes, a second with text values
> that include random data, up to 10~12 bytes each to count on the tuple
> header to be able to compress some data, and a third with more
> repeatable data, like one attribute with an int column populate 
> with generate_series().  Just to give an idea.

And that's what I did as of the attached set of test:
- Cluster on tmpfs.
- max_wal_size, min_wal_size at 2GB and shared_buffers at 1GB, aka
large enough to include the full data set in memory.
- Rather than using Justin's full patch set, I have just patched the
code in xloginsert.c to switch the level.
- One case with table that uses one int attribute, with rather
repetitive data worth 484MB.
- One case with table using (int, text), where the text data is made
of 10~11 bytes of random data, worth ~340MB.
- Use pg_prewarm to load the data into shared buffers.  With the
cluster mounted on a tmpfs that should not matter though.
- Both tables have a fillfactor at 50 to give room to the updates.

I have measured the CPU usage with a toy extension, also attached,
called pg_rusage() that is a simple wrapper to upstream's pg_rusage.c 
to initialize a rusage snapshot and print its data with two SQL
functions that are called just before and after the FPIs are generated
(aka the UPDATE query that rewrites the whole table in the script
attached).

The quickly-hacked test script and the results are in test.tar.gz, for
reference.  The toy extension is pg_rusage.tar.gz.

Here are the results I compiled, as of results_format.sql in the
tarball attached:
             descr             | rel_size | fpi_size | time_s 
-------------------------------+----------+----------+--------
 int column no compression     | 429 MB   | 727 MB   |  13.15
 int column ztsd default level | 429 MB   | 523 MB   |  14.23
 int column zstd level 1       | 429 MB   | 524 MB   |  13.94
 int column zstd level 10      | 429 MB   | 523 MB   |  23.46
 int column zstd level 19      | 429 MB   | 523 MB   | 103.71
 int column lz4 default level  | 429 MB   | 575 MB   |  13.37
 int/text no compression       | 344 MB   | 558 MB   |  10.08
 int/text lz4 default level    | 344 MB   | 463 MB   |  10.29
 int/text zstd default level   | 344 MB   | 415 MB   |  11.48
 int/text zstd level 1         | 344 MB   | 418 MB   |  11.25
 int/text zstd level 10        | 344 MB   | 415 MB   |  20.59
 int/text zstd level 19        | 344 MB   | 413 MB   |  62.64
(12 rows)

I did not expect zstd to be this slow at a level of 10~ actually.  The
runtime (elapsed CPU time) got severely impacted at level 19, that I
ran just for fun to see how that it would become compared to a level
of 10.  There is a slight difference between the default level and a
level of 1, and the compression size does not change much, nor does
the CPU usage really change.

Attached is an updated patch, while on it, that I have tweaked before
running my own tests.

At the end, I'd still like to think that we'd better stick with the
default level for this parameter, and that's the suggestion of
upstream.  So I would like to move on with that for this patch.
--
Michael

Attachment: test.tar.gz
Description: application/gzip

Attachment: pg_rusage.tar.gz
Description: application/gzip

From 254ddbf4223c35a7990e301e53d6ddbffcf210c0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Fri, 18 Feb 2022 22:54:18 -0600
Subject: [PATCH v2] add wal_compression=zstd

---
 src/include/access/xlog.h                     |  3 +-
 src/include/access/xlogrecord.h               |  5 +++-
 src/backend/access/transam/xloginsert.c       | 30 ++++++++++++++++++-
 src/backend/access/transam/xlogreader.c       | 20 +++++++++++++
 src/backend/utils/misc/guc.c                  |  3 ++
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_waldump/pg_waldump.c               |  2 ++
 doc/src/sgml/config.sgml                      | 11 ++++---
 doc/src/sgml/installation.sgml                |  8 +++++
 9 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4b45ac64db..09f6464331 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -75,7 +75,8 @@ typedef enum WalCompression
 {
 	WAL_COMPRESSION_NONE = 0,
 	WAL_COMPRESSION_PGLZ,
-	WAL_COMPRESSION_LZ4
+	WAL_COMPRESSION_LZ4,
+	WAL_COMPRESSION_ZSTD
 } WalCompression;
 
 /* Recovery states */
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index c1b1137aa7..052ac6817a 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -149,8 +149,11 @@ typedef struct XLogRecordBlockImageHeader
 /* compression methods supported */
 #define BKPIMAGE_COMPRESS_PGLZ	0x04
 #define BKPIMAGE_COMPRESS_LZ4	0x08
+#define BKPIMAGE_COMPRESS_ZSTD	0x10
+
 #define	BKPIMAGE_COMPRESSED(info) \
-	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4)) != 0)
+	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4 | \
+			  BKPIMAGE_COMPRESS_ZSTD)) != 0)
 
 /*
  * Extra header information used when page image has "hole" and
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index c260310c4c..b61c08e586 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -44,9 +44,17 @@
 #define LZ4_MAX_BLCKSZ		0
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#define ZSTD_MAX_BLCKSZ		ZSTD_COMPRESSBOUND(BLCKSZ)
+#else
+#define ZSTD_MAX_BLCKSZ		0
+#endif
+
+/* Buffer size required to store a compressed version of backup block image */
 #define PGLZ_MAX_BLCKSZ		PGLZ_MAX_OUTPUT(BLCKSZ)
 
-#define COMPRESS_BUFSIZE	Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ)
+#define COMPRESS_BUFSIZE	Max(Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ), ZSTD_MAX_BLCKSZ)
 
 /*
  * For each block reference registered with XLogRegisterBuffer, we fill in
@@ -695,6 +703,14 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 #endif
 						break;
 
+					case WAL_COMPRESSION_ZSTD:
+#ifdef USE_ZSTD
+						bimg.bimg_info |= BKPIMAGE_COMPRESS_ZSTD;
+#else
+						elog(ERROR, "ZSTD is not supported by this build");
+#endif
+						break;
+
 					case WAL_COMPRESSION_NONE:
 						Assert(false);	/* cannot happen */
 						break;
@@ -903,6 +919,18 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length,
 #endif
 			break;
 
+		case WAL_COMPRESSION_ZSTD:
+#ifdef USE_ZSTD
+			/* Uses level=1, not ZSTD_CLEVEL_DEFAULT */
+			len = ZSTD_compress(dest, COMPRESS_BUFSIZE, source, orig_len,
+								ZSTD_CLEVEL_DEFAULT);
+			if (ZSTD_isError(len))
+				len = -1;		/* failure */
+#else
+			elog(ERROR, "ZSTD is not supported by this build");
+#endif
+			break;
+
 		case WAL_COMPRESSION_NONE:
 			Assert(false);		/* cannot happen */
 			break;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..d60e4cbaf6 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -21,6 +21,9 @@
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
 
 #include "access/transam.h"
 #include "access/xlog_internal.h"
@@ -1618,6 +1621,23 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 								  "LZ4",
 								  block_id);
 			return false;
+#endif
+		}
+		else if ((bkpb->bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+		{
+#ifdef USE_ZSTD
+			size_t decomp_result = ZSTD_decompress(tmp.data,
+												   BLCKSZ-bkpb->hole_length,
+												   ptr, bkpb->bimg_len);
+
+			if (ZSTD_isError(decomp_result))
+				decomp_success = false;
+#else
+			report_invalid_record(record, "image at %X/%X compressed with %s not supported by build, block %d",
+								  LSN_FORMAT_ARGS(record->ReadRecPtr),
+								  "zstd",
+								  block_id);
+			return false;
 #endif
 		}
 		else
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6d11f9c71b..e7f0a380e6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -550,6 +550,9 @@ static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
 	{"lz4", WAL_COMPRESSION_LZ4, false},
+#endif
+#ifdef USE_ZSTD
+	{"zstd", WAL_COMPRESSION_ZSTD, false},
 #endif
 	{"on", WAL_COMPRESSION_PGLZ, false},
 	{"off", WAL_COMPRESSION_NONE, false},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4a094bb38b..4cf5b26a36 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -220,7 +220,7 @@
 #wal_log_hints = off			# also do full page writes of non-critical updates
 					# (change requires restart)
 #wal_compression = off			# enables compression of full-page writes;
-					# off, pglz, lz4, or on
+					# off, pglz, lz4, zstd, or on
 #wal_init_zero = on			# zero-fill new WAL files
 #wal_recycle = on			# recycle WAL files
 #wal_buffers = -1			# min 32kB, -1 sets based on shared_buffers
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 2340dc247b..f128050b4e 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -562,6 +562,8 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 						method = "pglz";
 					else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
 						method = "lz4";
+					else if ((bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+						method = "zstd";
 					else
 						method = "unknown";
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..5612e80453 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3154,10 +3154,13 @@ include_dir 'conf.d'
         server compresses full page images written to WAL when
         <xref linkend="guc-full-page-writes"/> is on or during a base backup.
         A compressed page image will be decompressed during WAL replay.
-        The supported methods are <literal>pglz</literal> and
-        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
-        compiled with <option>--with-lz4</option>). The default value is
-        <literal>off</literal>. Only superusers can change this setting.
+        The supported methods are <literal>pglz</literal>,
+        <literal>lz4</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-lz4</option>) and
+        <literal>zstd</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-zstd</option>) and
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.
        </para>
 
        <para>
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 0f74252590..bd09003d59 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -271,6 +271,14 @@ su - postgres
      </para>
     </listitem>
 
+    <listitem>
+     <para>
+      You need <productname>ZSTD</productname>, if you want to support
+      compression of data with this method; see
+      <xref linkend="guc-wal-compression"/>.
+     </para>
+    </listitem>
+
     <listitem>
      <para>
       To build the <productname>PostgreSQL</productname> documentation,
-- 
2.35.1

Attachment: signature.asc
Description: PGP signature

Reply via email to