On Thu, Mar 24, 2022 at 6:49 PM Andres Freund <and...@anarazel.de> wrote: > That's not a whole lot of fun if you think of cases like postgres_fdw (or > citus as in Jelte's case), which run inside the backend. Even with just a > single postgres_fdw, we don't really want to end up in an uninterruptible > PQcancel() that doesn't even react to pg_terminate_backend(). > > Even if using threads weren't an issue, I don't really buy the premise - most > networking code has moved *away* from using dedicated threads for each > connection. It just doesn't scale. > > Leaving PQcancel aside, we use the non-blocking libpq stuff widely > ourselves. I think walreceiver, isolationtester, pgbench etc would be *much* > harder to get working equally well if there was just blocking calls. If > anything, we're getting to the point where purely blocking functionality > shouldn't be added anymore.
+1. I think having a non-blocking version of PQcancel() available is a great idea, and I've wanted it myself. See commit ae9bfc5d65123aaa0d1cca9988037489760bdeae. That said, I don't think that this particular patch is going in the right direction. I think Jacob's comment upthread is right on point: "This seems like a big change compared to PQcancel(); one that's not really hinted at elsewhere. Having the async version of an API open up a completely different code path with new features is pretty surprising to me." It seems to me that we want to end up with similar code paths for PQcancel() and the non-blocking version of cancel. We could get there in two ways. One way would be to implement the non-blocking functionality in a manner that matches exactly what PQcancel() does now. I imagine that the existing code from PQcancel() would move, with some amount of change, into a new set of non-blocking APIs. Perhaps PQcancel() would then be rewritten to use those new APIs instead of hand-rolling the same logic. The other possible approach would be to first change the blocking version of PQcancel() to use the regular connection code instead of its own idiosyncratic logic, and then as a second step, extend it with non-blocking interfaces that use the regular non-blocking connection code. With either of these approaches, we end up with the functionality working similarly in the blocking and non-blocking code paths. Leaving the question of approach aside, I think it's fairly clear that this patch cannot be seriously considered for v15. One problem is the lack of user-facing documentation, but there's a other stuff that just doesn't look sufficiently well-considered. For example, it updates the comment for pqsecure_read() to say "Returns -1 in case of failures, except in the case of clean connection closure then it returns -2." But that function calls any of three different implementation functions depending on the situation and the patch only updates one of them. And it updates that function to return -2 when the is ECONNRESET, which seems to fly in the face of the comment's idea that this is the "clean connection closure" case. I think it's probably a bad sign that this function is tinkering with logic in this sort of low-level function anyway. pqReadData() is a really general function that manages to work with non-blocking I/O already, so why does non-blocking query cancellation need to change its return values, or whether or not it drops data in certain cases? I'm also skeptical about the fact that we end up with a whole bunch of new functions that are just wrappers around existing functions. That's not a scalable approach. Every function that we have for a PGconn will eventually need a variant that deals with a PGcancelConn. That seems kind of pointless, especially considering that a PGcancelConn is *exactly* a PGconn in disguise. If we decide to pursue the approach of using the existing infrastructure for PGconn objects to handle query cancellation, we ought to manipulate them using the same functions we currently do, with some kind of mode or flag or switch or something that you can use to turn a regular PGconn into something that cancels a query. Maybe you create the PGconn and call PQsprinkleMagicCancelDust() on it, and then you just proceed using the existing functions, or something like that. Then, not only do the existing functions not need query-cancel analogues, but any new functions we add in the future don't either. I'll set the target version for this patch to 16. I hope work continues. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com