Hi hackers,

PostgreSQL hit the following assertion during error cleanup, after being OOM in dsa_allocate0():

void dshash_detach(dshash_table *hash_table) { ASSERT_NO_PARTITION_LOCKS_HELD_BY_ME(hash_table);

called from pgstat_shutdown_hook(), called from shmem_exit(), called from proc_exit(), called from the exception handler.

The partition locks got previously acquired by

AutoVacWorkerMain() pgstat_report_autovac() pgstat_get_entry_ref_locked() pgstat_get_entry_ref() dshash_find_or_insert() resize() resize() locks all partitions so the hash table can safely be resized. Then it calls dsa_allocate0(). If dsa_allocate0() fails to allocate, it errors out. The exception handler calls proc_exit() which normally calls LWLockReleaseAll() via AbortTransaction() but only if there's an active transaction. However, pgstat_report_autovac() runs before a transaction got started and hence LWLockReleaseAll() doesn't run before pgstat_shutdown_hook() is called.

See attached patch for an attempt to fix this issue.

--
David Geier
(ServiceNow)
From 5580e3680b2211235e4bc2b5dcbfe6b4f5b8eee5 Mon Sep 17 00:00:00 2001
From: David Geier <geidav...@gmail.com>
Date: Tue, 28 Nov 2023 18:52:46 +0100
Subject: [PATCH] Fix autovacuum cleanup on error

---
 src/backend/postmaster/autovacuum.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f929b62e8a..5de55649d9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1584,6 +1584,9 @@ AutoVacWorkerMain(int argc, char *argv[])
 		/* Report the error to the server log */
 		EmitErrorReport();
 
+		/* Make sure all locks are released so assertions don't hit in at-exit callbacks */
+		LWLockReleaseAll();
+
 		/*
 		 * We can now go away.  Note that because we called InitProcess, a
 		 * callback was registered to do ProcKill, which will clean up
-- 
2.39.2

Reply via email to