Re: Add non-blocking version of PQcancel

Robert Haas Fri, 25 Mar 2022 11:34:59 -0700

On Thu, Mar 24, 2022 at 6:49 PM Andres Freund <and...@anarazel.de> wrote:
> That's not a whole lot of fun if you think of cases like postgres_fdw (or
> citus as in Jelte's case), which run inside the backend. Even with just a
> single postgres_fdw, we don't really want to end up in an uninterruptible
> PQcancel() that doesn't even react to pg_terminate_backend().
>
> Even if using threads weren't an issue, I don't really buy the premise - most
> networking code has moved *away* from using dedicated threads for each
> connection. It just doesn't scale.
>
> Leaving PQcancel aside, we use the non-blocking libpq stuff widely
> ourselves. I think walreceiver, isolationtester, pgbench etc would be *much*
> harder to get working equally well if there was just blocking calls. If
> anything, we're getting to the point where purely blocking functionality
> shouldn't be added anymore.


+1. I think having a non-blocking version of PQcancel() available is a
great idea, and I've wanted it myself. See commit
ae9bfc5d65123aaa0d1cca9988037489760bdeae.

That said, I don't think that this particular patch is going in the
right direction. I think Jacob's comment upthread is right on point:
"This seems like a big change compared to PQcancel(); one that's not
really hinted at elsewhere. Having the async version of an API open up
a completely different code path with new features is pretty
surprising to me." It seems to me that we want to end up with similar
code paths for PQcancel() and the non-blocking version of cancel. We
could get there in two ways. One way would be to implement the
non-blocking functionality in a manner that matches exactly what
PQcancel() does now. I imagine that the existing code from PQcancel()
would move, with some amount of change, into a new set of non-blocking
APIs. Perhaps PQcancel() would then be rewritten to use those new APIs
instead of hand-rolling the same logic. The other possible approach
would be to first change the blocking version of PQcancel() to use the
regular connection code instead of its own idiosyncratic logic, and
then as a second step, extend it with non-blocking interfaces that use
the regular non-blocking connection code. With either of these
approaches, we end up with the functionality working similarly in the
blocking and non-blocking code paths.

Leaving the question of approach aside, I think it's fairly clear that
this patch cannot be seriously considered for v15. One problem is the
lack of user-facing documentation, but there's a other stuff that just
doesn't look sufficiently well-considered. For example, it updates the
comment for pqsecure_read() to say "Returns -1 in case of failures,
except in the case of clean connection closure then it returns -2."
But that function calls any of three different implementation
functions depending on the situation and the patch only updates one of
them. And it updates that function to return -2 when the is
ECONNRESET, which seems to fly in the face of the comment's idea that
this is the "clean connection closure" case. I think it's probably a
bad sign that this function is tinkering with logic in this sort of
low-level function anyway. pqReadData() is a really general function
that manages to work with non-blocking I/O already, so why does
non-blocking query cancellation need to change its return values, or
whether or not it drops data in certain cases?

I'm also skeptical about the fact that we end up with a whole bunch of
new functions that are just wrappers around existing functions. That's
not a scalable approach. Every function that we have for a PGconn will
eventually need a variant that deals with a PGcancelConn. That seems
kind of pointless, especially considering that a PGcancelConn is
*exactly* a PGconn in disguise. If we decide to pursue the approach of
using the existing infrastructure for PGconn objects to handle query
cancellation, we ought to manipulate them using the same functions we
currently do, with some kind of mode or flag or switch or something
that you can use to turn a regular PGconn into something that cancels
a query. Maybe you create the PGconn and call
PQsprinkleMagicCancelDust() on it, and then you just proceed using the
existing functions, or something like that. Then, not only do the
existing functions not need query-cancel analogues, but any new
functions we add in the future don't either.

I'll set the target version for this patch to 16. I hope work continues.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add non-blocking version of PQcancel

Reply via email to