On Tue Jun 24, 2025 at 5:07 PM CEST, Jacob Champion wrote:
So that's
1) return an (empty) cancellation object even if the server has not
sent a key, and
2) error out when trying to cancel with an empty object?
That sounds reasonable to me.
Attached is an attempt at implementing the above. I did not test it
against these systems though.
I also added small commit that checks for the 256 byte key length limit
described in the protocol documentation. This thread made me remember
that we talked about that during PGConf.dev.
From c1f92f01898544a21186ae13d57ca7c811359fda Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <github-t...@jeltef.nl>
Date: Wed, 25 Jun 2025 08:36:51 +0200
Subject: [PATCH v1 1/2] libpq: Complain about missing BackendKeyData later
It turns out that some third party backend implementations^1 don't send
BackendKeyData as a way to indicate that they don't. While the protocol
docs left it up for interpretation if that is valid behavior, libpq has
been accepting it up until the libpq shipped with PG17. It does not seem
like that the libpq behavior was intentional though, since it did so by
sending CancelRequest messages with all zeros to such servers (instead
of returning an error or making the cancel a no-op).
For PG18 this behaviour was changed to return an error when trying to
create the cancel object (either a PGcancel or PGcancelConn). This
wasn't done with any discussion, but was done as part of supporting
different lengths of cancel packets for the new 3.2 version of the
protocol.
This commit changes the behavior once more to only return an error when
the cancel object is actually used to send a cancellation, instead of
when merely creating the object. The reason to do so is that some
clients create such cancel objects as part of their connection creation
logic (thus having the cancel object ready for later when they need it).
So by returning an error when creating that object, the connection
attempt would fail. So by changing when we return the error such clients
will still be able to connect to the third party backend implementations
in question, but when actually trying to cancel a query on one of these
backends the user will be notified that that is not possible for the
server that they are connected to.
^1: AWS RDS Proxy is definitely one of them, and CockroachDB might be
another.
---
doc/src/sgml/protocol.sgml | 6 +++++-
src/interfaces/libpq/fe-cancel.c | 37 ++++++++++++++++++++++++--------
2 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 137ffc8d0b7..d96b6851976 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -535,7 +535,11 @@
This message provides secret-key data that the frontend must
save if it wants to be able to issue cancel requests later.
The frontend should not respond to this message, but should
- continue listening for a ReadyForQuery message.
+ continue listening for a ReadyForQuery message. The PostgreSQL backend
+ will always send this message before sending the inital ReadyForQuery
+ message but other backend implementations of the protocol may not. If no
+ BackendKeyData is sent by the server, then that means that the backend
+ does not support canceling queries using the CancelRequest messages.
</para>
</listitem>
</varlistentry>
diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c
index cd3102346bf..3c9f29f6ac7 100644
--- a/src/interfaces/libpq/fe-cancel.c
+++ b/src/interfaces/libpq/fe-cancel.c
@@ -86,13 +86,6 @@ PQcancelCreate(PGconn *conn)
return (PGcancelConn *) cancelConn;
}
- /* Check that we have received a cancellation key */
- if (conn->be_cancel_key_len == 0)
- {
- libpq_append_conn_error(cancelConn, "no cancellation key received");
- return (PGcancelConn *) cancelConn;
- }
-
/*
* Indicate that this connection is used to send a cancellation
*/
@@ -111,7 +104,7 @@ PQcancelCreate(PGconn *conn)
* Copy cancellation token data from the original connection
*/
cancelConn->be_pid = conn->be_pid;
- if (conn->be_cancel_key != NULL)
+ if (conn->be_cancel_key_len > 0)
{
cancelConn->be_cancel_key = malloc(conn->be_cancel_key_len);
if (cancelConn->be_cancel_key == NULL)
@@ -205,6 +198,17 @@ PQcancelStart(PGcancelConn *cancelConn)
if (!cancelConn || cancelConn->conn.status == CONNECTION_BAD)
return 0;
+ /*
+ * Check that we actually have a concel key. We check this here as apposed
+ * to in PQcancelCreate because users of libpq might call PQcancelCreate
+ * even when they don't need to cancel a connection.
+ */
+ if (cancelConn->conn.be_cancel_key_len == 0)
+ {
+ libpq_append_conn_error(&cancelConn->conn, "no cancellation key received");
+ return 0;
+ }
+
if (cancelConn->conn.status != CONNECTION_ALLOCATED)
{
libpq_append_conn_error(&cancelConn->conn,
@@ -378,7 +382,14 @@ PQgetCancel(PGconn *conn)
/* Check that we have received a cancellation key */
if (conn->be_cancel_key_len == 0)
- return NULL;
+ {
+ /*
+ * In case there is no cancel key, we return an all-zero PGCancel
+ * object. Actually calling PQcancel on this will fail, but we allow
+ * creating the PGCancel object
+ */
+ return calloc(1, sizeof(PGcancel));
+ }
cancel_req_len = offsetof(CancelRequestPacket, cancelAuthCode) + conn->be_cancel_key_len;
cancel = malloc(offsetof(PGcancel, cancel_req) + cancel_req_len);
@@ -543,6 +554,14 @@ PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
return false;
}
+ if (cancel->cancel_pkt_len == 0)
+ {
+ strlcpy(errbuf, "PQcancel() -- no cancellation key received", errbufsize);
+ /* strlcpy probably doesn't change errno, but be paranoid */
+ SOCK_ERRNO_SET(save_errno);
+ return false;
+ }
+
/*
* We need to open a temporary connection to the postmaster. Do this with
* only kernel calls.
base-commit: fd519419c9484a47f068cc04e2db81a4ec661669
--
2.43.0
From 591ed4beea42e0982d5fbb0aaae1abf27ad8b099 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <github-t...@jeltef.nl>
Date: Wed, 25 Jun 2025 08:54:15 +0200
Subject: [PATCH v1 2/2] libpq: Disallow cancel keys longer than 256 bytes
The protocol documentation states that the maximum length of a cancel key
is 256 bytes. This starts checking for that limit in libpq. Otherwise
third party backend implementations will probably start using more bytes
anyway.
---
src/interfaces/libpq/fe-protocol3.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index beb1c889aad..fc34fb52766 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -1546,6 +1546,12 @@ getBackendKeyData(PGconn *conn, int msgLength)
cancel_key_len = 5 + msgLength - (conn->inCursor - conn->inStart);
+ if (cancel_key_len > 256)
+ {
+ libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d exceeds maximum of 256 bytes", cancel_key_len);
+ return EOF;
+ }
+
conn->be_cancel_key = malloc(cancel_key_len);
if (conn->be_cancel_key == NULL)
{
--
2.43.0