On Wed, 17 Jun 2026 at 08:00, Alexander Lakhin <[email protected]> wrote:
>
> Hello hackers,
>
> I'd like to share my findings related to OOM error handling. I'm not sure
> how large the class of such anomalies is (and if all of these can be
> detected and fixed), but please look at a few issues I have discovered so
> far:
>
> 1) An issue in lookup_type_cache()
>
> The following modification:
<snip>

> makes this script:
<snip>

> trigger an assertion failure:
<snip>

> Without asserts enables, the server might crash.

I believe this is caused by partial subsystem initialization. Attached
patch 0001 should address this failure without causing the server to
restart on OOM.

> 2) An issue in GetSnapshotData()
>
> The following modification:
<snip>

> makes this script (max_prepared_transactions = 2 in postgresql.conf):
<snip>

> trigger a segmentation fault:
<snip>

Again, caused by partial initialization, though in this case it's of a
SnapshotData* which is later checked again. Attached patch 0002 should
address this failure.


> 3) An issue in StandbyAcquireAccessExclusiveLock()
<snip>

I'm not sure how to solve this correctly; I think ideally the
StandbyAcquireAccessExclusiveLock() hash code would be wrapped by a
critical section, but I'm not 100% sure if that will be a sufficient
approach; and it'd definitely need some code to allow the various
hashmaps' memctxs to alloc during critical sections.


Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

Attachment: v1-0002-Make-GetSnapshotData-more-resilient-against-OOM-e.patch
Description: Binary data

Attachment: v1-0001-typcache-Use-new-LAZY_INIT-system.patch
Description: Binary data

Reply via email to