On Thu, Dec 17, 2020 at 05:54:28PM +0300, Konstantin Knizhnik wrote:
> I am maintaining this code in
> g...@github.com:postgrespro/libpq_compression.git repository.
> I will be pleased if anybody, who wants to suggest any bug
> fixes/improvements of libpq compression, create pull requests: it will be
> much easier for me to merge them.

Thanks for working on this.
I have a patch for zstd compression in pg_dump so I looked at your patch.
I'm attaching some language fixes.

> +zstd_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, 
> char* rx_data, size_t rx_data_size)
> +zlib_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, 
> char* rx_data, size_t rx_data_size)
> +build_compressors_list(PGconn *conn, char** client_compressors, bool 
> build_descriptors)

Are you able to run pg_indent to fix all the places where "*" is before the
space ?  (And similar style issues).

There are several compression patches in the commitfest, I'm not sure how much
they need to be coordinated, but for sure we should coordinate the list of
compressions available at compile time.

Maybe there should be a central structure for this, or maybe just the ID
numbers of compressors should be a common enum/define.  In my patch, I have:

+struct compressLibs {
+       const CompressionAlgorithm alg;
+       const char      *name;                  /* Name in -Z alg= */
+       const char      *suffix;                /* file extension */
+       const int       defaultlevel;   /* Default compression level */
+};

Maybe we'd also want to store the "magic number" of each compression library.
Maybe there'd also be a common parsing of compression options.

You're supporting a syntax like zlib,zstd:5, but zstd also supports long-range,
checksum, and rsyncable modes (rsyncable is relevant to pg_dump, but not to
libpq).

I think your patch has an issue here.  You have this:

src/interfaces/libpq/fe-connect.c

+            pqGetc(&resp, conn);
+            index = resp;
+            if (index == (char)-1)
+            {
+                    appendPQExpBuffer(&conn->errorMessage,
+                                                      libpq_gettext(
+                                                              "server is not 
supported requested compression algorithms %s\n"),
+                                                      conn->compression);
+                    goto error_return;
+            }
+            Assert(!conn->zstream);
+            conn->zstream = zpq_create(conn->compressors[index].impl,
+                                                               
conn->compressors[index].level,
+                                                               
(zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read, conn,
+                                                               
&conn->inBuffer[conn->inCursor], conn->inEnd-conn->inCursor);

This takes the "index" returned by the server and then accesses
conn->compressors[index] without first checking if the index is out of range,
so a malicious server could (at least) crash the client by returning index=666.

I suggest that there should be an enum of algorithms, which is constant across
all servers.  They would be unconditionally included and not #ifdef depending
on compilation options.  

That would affect the ZpqAlgorithm data structure, which would include an ID
number similar to
src/bin/pg_dump/compress_io.h:typedef enum...CompressionAlgorithm;

The CompressionAck would send the ID rather than the "index".  
A protocol analyzer like wireshark could show "Compression: Zstd".
You'd have to verify that the ID is supported (and not bogus).  

Right now, when I try to connect to an unpatched server, I get:
psql: error: expected authentication request from server, but received v

+/*
+ * Array with all supported compression algorithms.
+ */
+static ZpqAlgorithm const zpq_algorithms[] =
+{
+#if HAVE_LIBZSTD
+       {zstd_name, zstd_create, zstd_read, zstd_write, zstd_free, zstd_error, 
zstd_buffered_tx, zstd_buffered_rx},
+#endif
+#if HAVE_LIBZ
+       {zlib_name, zlib_create, zlib_read, zlib_write, zlib_free, zlib_error, 
zlib_buffered_tx, zlib_buffered_rx},
+#endif
+       {no_compression_name}
+};

In config.sgml, it says that libpq_compression defaults to on (on the server
side), but in libpq.sgml it says that it defaults to off (on the client side).
Is that what's intended ?  I would've thought the defaults would match, or that
the server would enforce a default more conservative than the client's (the DBA
should probably have to explicitly enable compression, and need to "opt-in"
rather than "opt-out").

Maybe instead of a boolean, this should be a list of permitted compression
algorithms.  This allows the admin to set a "policy" rather than using the
server's hard-coded preferences.  This could be important to disable an
algorithm at run-time if there's a vulnerability found, or performance problem,
or buggy client, or for diagnostics, or performance testing.  Actually, I think
it may be important to allow the admin to disable not just an algorithm, but
also its options.  It should be possible for the server to allow "zstd"
compression but not "zstd --long", or zstd:99 or arbitrarily large window
sizes.  This seems similar to SSL cipher strings.  I think we'd want to be able
to allow (or prohibit) at least alg:* (with options) or alg (with default
options).

Your patch documents "libpq_compression=auto" but that doesn't work:
WARNING: none of specified algirthms auto is supported by client
I guess you mean "any" which is what's implemented.
I suggest to just remove that.

I think maybe your patch should include a way to trivially change the client's
compile-time default:
"if (!conn->compression) conn->compression = DefaultCompression"

Your patch warns if *none* of the specified algorithms are supported, but I
wonder if it should warn if *any* of them are unsupported (like if someone
writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess
about half of us will do).

$ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,def 
src/bin/psql/psql 'host=localhost port=1111'
WARNING: none of the specified algorithms are supported by client: abc,def
$ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,zlib 
src/bin/psql/psql 'host=localhost port=1111'
(no warning)

The libpq_compression GUC can be set for a user, like ALTER ROLE .. SET ...
Is there any utility in making it configurable by client address?  Or by
encryption?  I'm thinking of a policy like "do not allow compression from LOCAL
connection" or "do not allow compression on encrypted connection".  Maybe this
would be somehow integrated into pg_hba.  But maybe it's not needed (a separate
user could be used to allow/disallow compression).

-- 
Justin
>From 5680eae5638e56ab5d11dc2050dd7cc995bc9f73 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 9 Jan 2021 09:49:23 -0600
Subject: [PATCH] fixes to docs and comments

---
 doc/src/sgml/config.sgml            | 10 +++---
 doc/src/sgml/libpq.sgml             | 23 ++++++++----
 doc/src/sgml/protocol.sgml          | 54 +++++++++++++----------------
 src/backend/libpq/pqcomm.c          |  2 +-
 src/common/zpq_stream.c             | 16 ++++-----
 src/include/common/zpq_stream.h     |  2 +-
 src/interfaces/libpq/fe-connect.c   |  2 +-
 src/interfaces/libpq/fe-protocol3.c | 26 +++++++-------
 8 files changed, 69 insertions(+), 66 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3b132dcf27..8b5a6525ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1004,12 +1004,10 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        When this parameter is <literal>on</literal> (default), the <productname>PostgreSQL</productname>
-        server can switch on compression of traffic between server and client if it is requested by client.
-        Client sends to the server list of compression algorithms supported by frontend library,
-        server chooses one which is supported by backend library and sends it in compression acknowledgement message
-        to the client. This option allows to reject compression request even if it is supported by server
-        (due to security, CPU consumption or whatever else reasons...).
+        This parameter enables compression of libpq traffic between client and server.
+        The default is <literal>on</literal>.
+        This option allows rejecting compression requests even if it is supported by server
+        (for example, due to security, or CPU consumption).
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 2934908520..d4da390ca9 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1244,13 +1244,22 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
       <term><literal>compression</literal></term>
       <listitem>
       <para>
-        Request compression of libpq traffic. Client sends to the server list of compression algorithms, supported by client library.
-        If server supports one of this algorithms, then it acknowledges use of this algorithm and then all libpq messages send both from client to server and
-        visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it rejects client request to use compression
-        and it is up to the client whether to continue work without compression or report error.
-        Supported compression algorithms are chosen at configure time. Right now two libraries are supported: zlib (default) and zstd (if Postgres was
-        configured with --with-zstd option). In both cases streaming mode is used.
-        By default compression is disabled. Please notice that using compression together with SSL may add extra vulnerabilities:
+        Request compression of libpq traffic. The client sends a request with a list of compression algorithms.
+        Compression can be requested by a client by including the "compression" option in its connection string.
+        This can either be a boolean value to enable or disable compression
+        ("true"/"false", "on"/"off", "yes"/"no", "1"/"0"), "auto", or an explicit list of comma-separated compression algorithms
+        which can optionally include compression level ("zlib,zstd:5").
+        If compression is enabled but an algorithm is not explicitly specified, the client library sends its full list of
+        supported algorithms and the server chooses a preferred algorithm.
+
+        If the server accepts one of the algorithms, it replies with an acknowledgment and all future libpq messages between client and server
+        will be compressed.
+        If the server rejects the compression request, it is up to the client whether to continue without compression or to report an error.
+        Support for compression algorithms must be enabled when the server is compiled.
+        Currently, two libraries are supported: zlib (default) and zstd (if Postgres was
+        configured with --with-zstd option). In both cases, streaming mode is used.
+        By default, compression is not requested by the client.
+        Please note that using compression together with SSL may expose extra vulnerabilities:
         <ulink url="https://en.wikipedia.org/wiki/CRIME";>CRIME</ulink>
       </para>
       </listitem>
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9f30aad8bd..a69f9bfc26 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -94,10 +94,10 @@
 
   <para>
     It is possible to compress protocol data to reduce traffic and speed-up client-server interaction.
-    Compression is especially useful for importing/exporting data to/from database using COPY command
-    and for replication (both physical and logical). Also compression can reduce server response time
-    in case of queries returning large amount of data (for example returning JSON, BLOBs, text,...)
-    Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+    Compression is especially useful for importing/exporting data to/from the database using the <literal>COPY</literal> command
+    and for replication (both physical and logical). Compression can also reduce the server's response time
+    for queries returning a large amount of data (for example, JSON, BLOBs, text, ...).
+    Currently, two libraries are supported: zlib (default) and zstd (if Postgres was
     configured with --with-zstd option).
   </para>
 
@@ -275,19 +275,14 @@
       <term>CompressionAck</term>
       <listitem>
        <para>
-         Server acknowledges using compression for client-server communication protocol.
-         Compression can be requested by client by including "compression" option in connection string.
-         It can be just boolean values enabling or disabling compression
-         ("true"/"false", "on"/"off", "yes"/"no", "1"/"0"), "auto" or explicit list of compression algorithms
-         separated by comma with optional specification of compression level: "zlib,zstd:5".
-         If compression algorithm is not explicitly specified the most efficient one supported both by
-         client and server is chosen. Client sends to the server list of compression algorithms,
-         supported by client library.
-         If server supports one of this algorithms, then it acknowledges use of this algorithm and
-         all subsequent libpq messages send both from client to server and
-         visa versa will be compressed. Server selects most efficient algorithm among list specified by client and returns to the client
-         index of chosen algorithm in this list. If server is not supporting any of the suggested algorithms, then it replies with -1
-         and it is up to the client whether to continue work without compression or report error.
+         The server accepts the client's compression request.
+         Compression is requested when a client connection includes the "compression" option, which includes
+         a list of requested compression algorithms.
+         If the server accepts one of these algorithms, it acknowledges use of compression and
+         all subsequent libpq messages between the client and server will be compressed.
+         The server chooses an algorithm from the list specified by client and responds with the index of the chosen algorithm from the client-supplied list.
+         If the server does not accept any of the requested algorithms, then it replies with an index of -1
+         and it is up to the client whether to continue without compression or to report an error.
        </para>
       </listitem>
      </varlistentry>
@@ -3460,12 +3455,14 @@ CompressionAck (B)
 </term>
 <listitem>
 <para>
-  Acknowledge use of compression for protocol data. Client sends to the server list of compression algorithms, supported by client library.
-  If server supports one of this algorithms, then it acknowledges use of this algorithm and all subsequent libpq messages send both from client to server and
-  visa versa will be compressed. Server selects most efficient algorithm among list specified by client and returns to the client
-  index of chosen algorithm in this list. If server is not supporting any of the suggested algorithms, then it replies with -1
-  and it is up to the client whether to continue work without compression or report error.
-  After receiving this message with algorithm index other than -1, both server and client are switched to compression mode
+  Acknowledge use of compression for protocol data. The client sends to the server a list of requested compression algorithms.
+  If the server supports any of these algorithms, it acknowledges use of this algorithm and all subsequent libpq messages between client and server
+  will be compressed.
+  The server selects the preferred algorithm from the list specified by client and responds with the
+  index of the chosen algorithm in this list.
+  If the server does not support any of the requested algorithms, it replies with -1
+  and it is up to the client whether to continue without compression or to report an error.
+  After receiving this message with algorithm index other than -1, both server and client switch to compressed mode
   and exchange compressed messages.
 </para>
 </listitem>
@@ -3486,7 +3483,7 @@ CompressionAck (B)
 </term>
 <listitem>
 <para>
-        Index of algorithm in the list of supported algotihms specified by client or -1 if none of them is supported.
+        Index of algorithm in the list of supported algorithms specified by client or -1 if none of them are supported.
 </para>
 </listitem>
 </varlistentry>
@@ -6069,12 +6066,11 @@ StartupMessage (F)
 </term>
 <listitem>
 <para>
-                        Request compression of libpq traffic. Value is list of compression algorithms supported by client with optional
+                        Request compression of libpq traffic. The value is a list of compression algorithms requested by the client with an optional
                         specification of compression level: <literal>"zlib,zstd:5"</literal>.
-                        When connecting to an older backend, which does not support compression, or in case when the backend support compression
-                        but for some reason wants to disable it, the backend will just ignore the _pq_.compression parameter and won’t send
-                        the compressionAck message to the frontend.
-                        By default compression is disabled. Please notice that using compression together with SSL may add extra vulnerabilities:
+                        If the server does not accept compression, the backend will ignore the _pq_.compression
+                        parameter and will not send the CompressionAck message to the frontend.
+                        By default, compression is disabled. Please note that using compression together with SSL may expose extra vulnerabilities:
                         <ulink url="https://en.wikipedia.org/wiki/CRIME";>CRIME</ulink>.
 </para>
 </listitem>
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8be1ce18de..f3293faf01 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1073,7 +1073,7 @@ pq_recvbuf(bool nowait)
 	/* Can fill buffer from PqRecvLength and upwards */
 	for (;;)
 	{
-		/* If streaming compression is enabled then use correspondent compression read function. */
+		/* If streaming compression is enabled then use corresponding compression read function. */
 		r = PqStream
 			? zpq_read(PqStream, PqRecvBuffer + PqRecvLength,
 					   PQ_RECV_BUFFER_SIZE - PqRecvLength)
diff --git a/src/common/zpq_stream.c b/src/common/zpq_stream.c
index 459566228a..e044e83e7c 100644
--- a/src/common/zpq_stream.c
+++ b/src/common/zpq_stream.c
@@ -9,15 +9,15 @@
 typedef struct
 {
 	/*
-	 * Returns name of compression algorithm.
+	 * Name of compression algorithm.
 	 */
 	char const* (*name)(void);
 
 	/*
-	 * Create compression stream with using rx/tx function for fetching/sending compressed data.
+	 * Create compression stream with rx/tx function for reading/sending compressed data.
 	 * level: compression level
 	 * tx_func: function for writing compressed data in underlying stream
-	 * rx_func: function for receiving compressed data from underlying stream
+	 * rx_func: function for reading compressed data from underlying stream
 	 * arg: context passed to the function
      * rx_data: received data (compressed data already fetched from input stream)
 	 * rx_data_size: size of data fetched from input stream
@@ -27,14 +27,14 @@ typedef struct
 	/*
 	 * Read up to "size" raw (decompressed) bytes.
 	 * Returns number of decompressed bytes or error code.
-	 * Error code is either ZPQ_DECOMPRESS_ERROR either error code returned by the rx function.
+	 * Error code is either ZPQ_DECOMPRESS_ERROR or error code returned by the rx function.
 	 */
 	ssize_t (*read)(ZpqStream *zs, void *buf, size_t size);
 
 	/*
 	 * Write up to "size" raw (decompressed) bytes.
 	 * Returns number of written raw bytes or error code returned by tx function.
-	 * In the last case amount of written raw bytes is stored in *processed.
+	 * In the last case number of bytes written is stored in *processed.
 	 */
 	ssize_t (*write)(ZpqStream *zs, void const *buf, size_t size, size_t *processed);
 
@@ -49,12 +49,12 @@ typedef struct
 	char const* (*error)(ZpqStream *zs);
 
 	/*
-	 * Returns amount of data in internal tx decompression buffer.
+	 * Return amount of data in internal tx decompression buffer.
 	 */
 	size_t  (*buffered_tx)(ZpqStream *zs);
 
 	/*
-	 * Returns amount of data in internal rx compression buffer.
+	 * Return amount of data in internal rx compression buffer.
 	 */
 	size_t  (*buffered_rx)(ZpqStream *zs);
 } ZpqAlgorithm;
@@ -265,7 +265,7 @@ zstd_name(void)
 
 #define ZLIB_BUFFER_SIZE       8192 /* We have to flush stream after each protocol command
 									 * and command is mostly limited by record length,
-									 * which in turn usually less than page size (except TOAST)
+									 * which in turn is usually less than page size (except TOAST)
 									 */
 
 typedef struct ZlibStream
diff --git a/src/include/common/zpq_stream.h b/src/include/common/zpq_stream.h
index 27aef0aab9..991e2c0165 100644
--- a/src/include/common/zpq_stream.h
+++ b/src/include/common/zpq_stream.h
@@ -1,6 +1,6 @@
 /*
  * zpq_stream.h
- *     Streaiming compression for libpq
+ *     Streaming compression for libpq
  */
 
 #ifndef ZPQ_STREAM_H
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index eac6ef66cc..679f8cde9f 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3269,7 +3269,7 @@ keep_going:						/* We will come back to here until there is
 						{
 							appendPQExpBuffer(&conn->errorMessage,
 											  libpq_gettext(
-												  "server is not supported requested compression algorithms %s\n"),
+												  "server does not support requested compression algorithms %s\n"),
 											  conn->compression);
 							goto error_return;
 						}
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 095866bba0..5b879569b4 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -2135,15 +2135,15 @@ pqBuildStartupPacket3(PGconn *conn, int *packetlen,
 }
 
 /*
- * Build comma-separated list of compression algorithms suggested by client to the server.
- * It can be either explicitly specified by user in connection string, either
- * include all algorithms supported by clit library.
- * This functions returns true if compression string is successfully parsed and
- * stores comma-separated list of algorithms in *client_compressors.
- * If compression is disabled, then NULL is assigned to  *client_compressors.
- * Also it creates array of compressor descriptors, each element of which corresponds
- * the correspondent algorithm name in *client_compressors list. This array is stored in PGconn
- * and is used during handshake when compassion acknowledgment response is received from the server.
+ * Build comma-separated list of compression algorithms requested by client.
+ * It can be either explicitly specified by user in connection string, or
+ * include all algorithms supported by client library.
+ * This function returns true if the compression string is successfully parsed and
+ * stores a comma-separated list of algorithms in *client_compressors.
+ * If compression is disabled, then NULL is assigned to *client_compressors.
+ * Also it creates an array of compressor descriptors, each element of which corresponds to
+ * the corresponding algorithm name in *client_compressors list. This array is stored in PGconn
+ * and is used during handshake when a compression acknowledgment response is received from the server.
  */
 static bool
 build_compressors_list(PGconn *conn, char** client_compressors, bool build_descriptors)
@@ -2170,7 +2170,7 @@ build_compressors_list(PGconn *conn, char** client_compressors, bool build_descr
 
 		if (n_supported_algorithms == 0)
 		{
-			*client_compressors = NULL; /* no compressors are avaialable */
+			*client_compressors = NULL; /* no compressors are available */
 			conn->compressors = NULL;
 			return true;
 		}
@@ -2204,7 +2204,7 @@ build_compressors_list(PGconn *conn, char** client_compressors, bool build_descr
 	}
 	else
 	{
-		/* List of compresison algorithms separated by commas */
+		/* List of compression algorithms separated by commas */
 		char *src, *dst;
 		int n_suggested_algorithms = 0;
 		char* suggested_algorithms = strdup(value);
@@ -2232,7 +2232,7 @@ build_compressors_list(PGconn *conn, char** client_compressors, bool build_descr
 				if (sscanf(col+1, "%d", &compression_level) != 1 && !build_descriptors)
 				{
 					fprintf(stderr,
-							libpq_gettext("WARNING: invlaid compression level %s in compression option '%s'\n"),
+							libpq_gettext("WARNING: invalid compression level %s in compression option '%s'\n"),
 							col+1, value);
 					return false;
 				}
@@ -2262,7 +2262,7 @@ build_compressors_list(PGconn *conn, char** client_compressors, bool build_descr
 		{
 			if (!build_descriptors)
 				fprintf(stderr,
-						libpq_gettext("WARNING: none of specified algirthms %s is supported by client\n"),
+						libpq_gettext("WARNING: none of the specified algorithms are supported by client: %s\n"),
 						value);
 			else
 			{
-- 
2.17.0

Reply via email to