On Wed, 12 Nov 2025 16:40:34 +0800 Chao Li <[email protected]> wrote:
> > > > On Nov 12, 2025, at 15:20, Yugo Nagata <[email protected]> wrote: > > > > On Tue, 11 Nov 2025 20:16:11 +0900 > > Fujii Masao <[email protected]> wrote: > > > >> On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <[email protected]> wrote: > >>> > >>> Fujii Masao <[email protected]> writes: > >>>> To address this, callers need a way to distinguish between > >>>> PGRES_FATAL_ERROR > >>>> and OOM. Functions that loop until PQgetResult() returns NULL should > >>>> continue > >>>> if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. > >>> > >>> Not sure about that. We might or might not be able to make progress > >>> if the caller keeps calling PQgetResult. But if it stops after an > >>> OOM report then we might as well just close the connection, because > >>> there is no hope of getting back in sync. > >>> > >>> I'm inclined to think that it's correct that we should return > >>> OOM_result not NULL if we couldn't make a PGresult, but further > >>> analysis is needed to make sure that libpq can make progress > >>> afterwards. I don't think we should expect applications to > >>> involve themselves in that recovery logic, because they won't, > >>> and most certainly won't test it carefully. > >>> > >>> The whole project might be doomed to failure really. I think > >>> that one very likely failure if we are approaching OOM is that > >>> we can't enlarge libpq's input buffer enough to accept all of > >>> a (long) incoming server message. What hope have we got of > >>> getting out of that situation? > >> > >> When pqCheckInBufferSpace() fails to enlarge the input buffer and > >> PQgetResult() > >> returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to > >> PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL > >> even in OOM case, so functions looping until NULL won't end up in an > >> infinite loop. > >> Of course, issuing another query on the same connection afterward could be > >> problematic, though. > >> > >> I'm still not sure this behavior is correct, but I'm just wondering if > >> asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is > >> returned. > > > > I agree that PQgetResult() should return NULL after returning OOM_result, > > and > > resetting asyncStatus to PGASYNC_IDLE would work in the normal mode. > > > > I also agree that continuing to use the same connection seems problematic, > > especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, > > PQgetResult() > > never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the > > command queue. > > > > Therefore, I wonder about closing the connection and resetting the status > > when OOM_result is retunred, by callling pqDropConnection() as > > handleFatalError() does. > > In this case, the connection status should also be set to CONNECTINO_BAD. > > > > This would prevent applications from continueing to use potentially > > problamatic > > connection without providing the way to distinguish OOM from other errors. > > > > What do you think? > > > > I've attached an updated patch in this direction. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo Nagata <[email protected]> > > <v2-0001-Make-PQgetResult-not-return-NULL-on-out-of-memory.patch> > > Here, the OOM condition is a client-side problem that’s generally beyond > libpq’s control. When an out-of-memory error occurs, it usually isn’t caused > by a memory leak inside libpq itself. After OOM, any operation that requires > additional allocation may fail — even something as simple as writing a log > message. Closing the current connection might release some memory, but that > memory could be immediately consumed by other processes, so further actions > might still fail. In such cases, simply aborting the process can be a > reasonable and straightforward strategy. > > However, in many cases the libpq client is an application server, where > forcibly aborting the entire process is not acceptable. From that > perspective, closing the affected connection is a more appropriate response. > The application should detect the broken database connection and attempt to > reconnect. If the OOM condition persists and reconnection continues to fail, > the application’s own recovery and failover logic should take over. What I meant is that closing the connection on OOM errors is intended to prevent applications from continuing to use a potentially problematic connection, rather than to release memory. Regards, Yugo Nagata -- Yugo Nagata <[email protected]>
