On 11/08/2025 09:58, Jelte Fennema-Nio wrote:
On Fri, 8 Aug 2025 at 17:50, Heikki Linnakangas <hlinn...@iki.fi> wrote:
I think any of the other options would be
better. There's no guarantee that more data will ever arrive, the
connection might be used just to wait for the notification.
Yeah, I think that's the important thing to realize here. The "try
again later" only makes sense if we need more data to try again. If we
don't then we now start waiting on data that might never come.
Here's a new set of patches, to disconnect on OOM instead of hanging or
silently discarding messages:
v4-0001-libpq-Don-t-hang-on-out-of-memory.patch
This is a very minimal fix for the "hang on OOM" issue. It changes the
behavior to silently skip over the message, i.e. discard the async
notification, or continue without cancellation key.
I think we should backpatch this.
v4-0002-libpq-Handle-OOMs-by-disconnecting-instead-of-sil.patch
This changes the behavior of all of the problematic places where we
silently discarded things on OOM to disconnect instead. That is, when
processing a Notify, BackendKeyData, or ParameterStatus message.
Now that I look at this again, we should probably forget about patch #1
and commit and backpatch this straight away. It's a little more changes
than patch #1, but not that much.
Note that there are still places where we discard things on OOM, like
when parsing an error 'E' message. Those are a little iffy too, but
they're less problematic because you still get an error, and the error
is clearly associated with a query, unlike these Notify, BackendData and
ParameterStatus messages.
v4-0003-libpq-Be-strict-about-cancel-key-lengths.patch
Your patch to add more checks, rebased over the first two.
- Heikki