On Tue, Dec 2, 2025 at 9:40 AM Amit Langote <[email protected]> wrote: > > Hi Rahila, > > On Fri, Nov 21, 2025 at 8:45 PM Rahila Syed <[email protected]> wrote: > > > > Hi, > > > > If a process encounters a FATAL error after acquiring a dshash lock but > > before releasing it, > > and it is not within a transaction, it can lead to a segmentation fault. > > > > The FATAL error causes the backend to exit, triggering proc_exit() and > > similar functions. > > In the absence of a transaction, LWLockReleaseAll() is delayed until > > ProcKill. ProcKill is > > an on_shmem_exit callback, and dsm_backend_shutdown() is called before any > > on_shmem_exit callbacks are invoked. > > Consequently, if a dshash lock was acquired before the FATAL error > > occurred, the lock > > will only be released after dsm_backend_shutdown() detaches the DSM segment > > containing > > the lock, resulting in a segmentation fault. > > Thanks for the report. > > I am not super familiar with this code path, but this raises a broader > question for me: are there other resources residing in DSM (besides > LWLocks) that might be accessed during the shutdown sequence? > > We know dshash and dsa locks (LWLocks) face this risk because ProcKill > runs as an on_shmem_exit callback, which happens after > dsm_backend_shutdown() has already detached the memory. > > This patch fixes the specific case for LWLocks, but if there are any > other on_shmem_exit callbacks that attempt to touch DSM memory, they > would still trigger a similar segfault. Do we need to worry about > other cleanup routines, or is ProcKill the only consumer of DSM data > at this stage? > > > Please find a reproducer attached. I have modified the test_dsm_registry > > module to create > > a background worker that does nothing but throws a FATAL error after > > acquiring the dshash lock. > > The reason this must be executed in the background worker is to ensure it > > runs without a transaction. > > > > To trigger the segmentation fault, apply the 0001-Reproducer* patch, run > > make install in the > > test_dsm_registry module, specify test_dsm_registry as > > shared_preload_libraries in postgresql.conf, > > and start the server. > > > > Please find attached a fix to call LWLockReleaseAll() early in the > > shmem_exit() routine. This ensures > > that the dshash lock is released before dsm_backend_shutdown() is called. > > This will also ensure that > > any subsequent callbacks invoked in shmem_exit() will not fail to acquire > > any lock. > > @@ -229,6 +230,14 @@ shmem_exit(int code) > { > shmem_exit_inprogress = true; > > + /* > + * Make sure we release any pending locks so that any callbacks called > + * subsequently do not fail to acquire any locks. This also fixes a seg > + * fault due to releasing a dshash lock after the dsm segment containing > + * the lock has been detached by dsm_backend_shutdown(). > + */ > + LWLockReleaseAll(); > + > /* > * Call before_shmem_exit callbacks. > * > > Again, not an expert, but I am concerned about placing > LWLockReleaseAll() at the very top, before before_shmem_exit() > callbacks run. > > One of those callbacks might rely on locks being held or assume the > consistency of shared memory structures protected by those locks. It > seems safer to sandwich the release between the two callback lists: > after before_shmem_exit is done, but before dsm_backend_shutdown() > runs.
I think it makes sense to place LWLockReleaseAll() right before the dsm_backend_shutdown() which is detaching the process from the dsm desgments. -- Regards, Dilip Kumar Google
