On Wed, 17 Jun 2026 at 08:00, Alexander Lakhin <[email protected]> wrote: > > Hello hackers, > > I'd like to share my findings related to OOM error handling. I'm not sure > how large the class of such anomalies is (and if all of these can be > detected and fixed), but please look at a few issues I have discovered so > far: > > 1) An issue in lookup_type_cache() > > The following modification: <snip>
> makes this script: <snip> > trigger an assertion failure: <snip> > Without asserts enables, the server might crash. I believe this is caused by partial subsystem initialization. Attached patch 0001 should address this failure without causing the server to restart on OOM. > 2) An issue in GetSnapshotData() > > The following modification: <snip> > makes this script (max_prepared_transactions = 2 in postgresql.conf): <snip> > trigger a segmentation fault: <snip> Again, caused by partial initialization, though in this case it's of a SnapshotData* which is later checked again. Attached patch 0002 should address this failure. > 3) An issue in StandbyAcquireAccessExclusiveLock() <snip> I'm not sure how to solve this correctly; I think ideally the StandbyAcquireAccessExclusiveLock() hash code would be wrapped by a critical section, but I'm not 100% sure if that will be a sufficient approach; and it'd definitely need some code to allow the various hashmaps' memctxs to alloc during critical sections. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
v1-0002-Make-GetSnapshotData-more-resilient-against-OOM-e.patch
Description: Binary data
v1-0001-typcache-Use-new-LAZY_INIT-system.patch
Description: Binary data
