Hi hackers,
PostgreSQL hit the following assertion during error cleanup, after being
OOM in dsa_allocate0():
void dshash_detach(dshash_table *hash_table) {
ASSERT_NO_PARTITION_LOCKS_HELD_BY_ME(hash_table);
called from pgstat_shutdown_hook(), called from shmem_exit(), called
from proc_exit(), called from the exception handler.
The partition locks got previously acquired by
AutoVacWorkerMain() pgstat_report_autovac()
pgstat_get_entry_ref_locked() pgstat_get_entry_ref()
dshash_find_or_insert() resize() resize() locks all partitions so the
hash table can safely be resized. Then it calls dsa_allocate0(). If
dsa_allocate0() fails to allocate, it errors out. The exception handler
calls proc_exit() which normally calls LWLockReleaseAll() via
AbortTransaction() but only if there's an active transaction. However,
pgstat_report_autovac() runs before a transaction got started and hence
LWLockReleaseAll() doesn't run before pgstat_shutdown_hook() is called.
See attached patch for an attempt to fix this issue.
--
David Geier
(ServiceNow)
From 5580e3680b2211235e4bc2b5dcbfe6b4f5b8eee5 Mon Sep 17 00:00:00 2001
From: David Geier <geidav...@gmail.com>
Date: Tue, 28 Nov 2023 18:52:46 +0100
Subject: [PATCH] Fix autovacuum cleanup on error
---
src/backend/postmaster/autovacuum.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f929b62e8a..5de55649d9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1584,6 +1584,9 @@ AutoVacWorkerMain(int argc, char *argv[])
/* Report the error to the server log */
EmitErrorReport();
+ /* Make sure all locks are released so assertions don't hit in at-exit callbacks */
+ LWLockReleaseAll();
+
/*
* We can now go away. Note that because we called InitProcess, a
* callback was registered to do ProcKill, which will clean up
--
2.39.2