On Wed, 12 Nov 2025 16:40:34 +0800
Chao Li <[email protected]> wrote:

> 
> 
> > On Nov 12, 2025, at 15:20, Yugo Nagata <[email protected]> wrote:
> > 
> > On Tue, 11 Nov 2025 20:16:11 +0900
> > Fujii Masao <[email protected]> wrote:
> > 
> >> On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <[email protected]> wrote:
> >>> 
> >>> Fujii Masao <[email protected]> writes:
> >>>> To address this, callers need a way to distinguish between 
> >>>> PGRES_FATAL_ERROR
> >>>> and OOM. Functions that loop until PQgetResult() returns NULL should 
> >>>> continue
> >>>> if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.
> >>> 
> >>> Not sure about that.  We might or might not be able to make progress
> >>> if the caller keeps calling PQgetResult.  But if it stops after an
> >>> OOM report then we might as well just close the connection, because
> >>> there is no hope of getting back in sync.
> >>> 
> >>> I'm inclined to think that it's correct that we should return
> >>> OOM_result not NULL if we couldn't make a PGresult, but further
> >>> analysis is needed to make sure that libpq can make progress
> >>> afterwards.  I don't think we should expect applications to
> >>> involve themselves in that recovery logic, because they won't,
> >>> and most certainly won't test it carefully.
> >>> 
> >>> The whole project might be doomed to failure really.  I think
> >>> that one very likely failure if we are approaching OOM is that
> >>> we can't enlarge libpq's input buffer enough to accept all of
> >>> a (long) incoming server message.  What hope have we got of
> >>> getting out of that situation?
> >> 
> >> When pqCheckInBufferSpace() fails to enlarge the input buffer and 
> >> PQgetResult()
> >> returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to
> >> PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL
> >> even in OOM case, so functions looping until NULL won't end up in an
> >> infinite loop.
> >> Of course, issuing another query on the same connection afterward could be
> >> problematic, though.
> >> 
> >> I'm still not sure this behavior is correct, but I'm just wondering if
> >> asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is 
> >> returned.
> > 
> > I agree that PQgetResult() should return NULL after returning OOM_result, 
> > and
> > resetting asyncStatus to PGASYNC_IDLE would work in the normal mode.
> > 
> > I also agree that continuing to use the same connection seems problematic,
> > especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, 
> > PQgetResult()
> > never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the 
> > command queue.
> > 
> > Therefore, I wonder about closing the connection and resetting the status
> > when OOM_result is retunred, by callling pqDropConnection() as 
> > handleFatalError() does.
> > In this case, the connection status should also be set to CONNECTINO_BAD.
> > 
> > This would prevent applications from continueing to use potentially 
> > problamatic
> > connection without providing the way to distinguish OOM from other errors.
> > 
> > What do you think?
> > 
> > I've attached an updated patch in this direction.
> > 
> > Regards,
> > Yugo Nagata
> > 
> > -- 
> > Yugo Nagata <[email protected]>
> > <v2-0001-Make-PQgetResult-not-return-NULL-on-out-of-memory.patch>
> 
> Here, the OOM condition is a client-side problem that’s generally beyond 
> libpq’s control. When an out-of-memory error occurs, it usually isn’t caused 
> by a memory leak inside libpq itself. After OOM, any operation that requires 
> additional allocation may fail — even something as simple as writing a log 
> message. Closing the current connection might release some memory, but that 
> memory could be immediately consumed by other processes, so further actions 
> might still fail. In such cases, simply aborting the process can be a 
> reasonable and straightforward strategy.
> 
> However, in many cases the libpq client is an application server, where 
> forcibly aborting the entire process is not acceptable. From that 
> perspective, closing the affected connection is a more appropriate response. 
> The application should detect the broken database connection and attempt to 
> reconnect. If the OOM condition persists and reconnection continues to fail, 
> the application’s own recovery and failover logic should take over.

What I meant is that closing the connection on OOM errors is intended to prevent
applications from continuing to use a potentially problematic connection, rather
than to release memory.

Regards,
Yugo Nagata


-- 
Yugo Nagata <[email protected]>


Reply via email to