On 06/03/2026 04:12, Jelte Fennema-Nio wrote:
On Thu Mar 5, 2026 at 7:30 PM CET, Heikki Linnakangas wrote:
It took me a while to get the big picture of how this works. cancel.c could use some high-level comments explaining how to use the facility; it's a real mixed bag right now.

Attached is a version with a bunch more comments. I agree this cancel
logic is hard to understand without them. It took me quite a while to
understand it myself. (I don't think the code got any harder to
understand with these changes though, the exact same complexity was
already there for Windows. But I agree more commends are good.)

Thanks. I agree it was complicated before these patches.

This is racy, if the cancellation thread doesn't immediately process the wakeup. For example, because it's still busy processing a previous wakeup, because there's a network hiccup or something. By the time the cancellation thread runs, the main thread might already be running a different query than it was when the user hit CTRL-C.

I now noted this in one of the new comments. I don't think there's a way
around this race condition entirely. It's simply a limitation of our
cancel protocol (because it's impossible to specify which query on a
connection should be cancelled).

That's true, but I still wonder if this could make it much worse.

In theory we could reduce the window for the race, by having all
frontend tools use async connections and have the main thread wait for
either the self-pipe or a cancel. That way it would be more similar to
the previous signal code in behaviour. That's a much bigger lift though,
i.e. all PQexec and PQgetResult calls would need to be modified. My
proposed change doesn't require changing the callsites at all.

Yeah, it does have that advantage..

One simple thing we could is to remember the "generation" in the signal handler, and store it in another global variable ("cancelledGeneration" or such). In the cancel thread, check that the generation matches; otherwise the thread is about to send a cancellation to a query that already finished, and should not send it.

I worry how this behaves if establishing the cancel connection gets stuck for a long time. Because of a network hiccup, for example. That's also not a new problem though; it's perhaps even worse today, if the signal handler gets stuck for a long time, trying to establish the connection. Still, would be good to do some testing with a bad network.

- Heikki



Reply via email to