On 2025-Jul-17, Andrey Borodin wrote: > Thinking more about the problem I see 3 ways to deal with this deadlock: > 1. We check for recovery conflict even in presence of > InterruptHoldoffCount. That's what patch v4 does. > 2. Teach page_collect_tuples() to do HeapTupleSatisfiesVisibility() > without holding buffer lock. > 3. Why do we even HOLD_INTERRUPTS() when aquire shared lock??
Hmm, as you say, doing (3) is a very invasive system-wide change, but can we do it more localized? I mean, what if we do RESUME_INTERRUPTS() just before going to sleep on the CV, and restore with HOLD_INTERRUPTS() once the sleep is done? That would only affect this one place rather than the whole system, and should also (AFAICS) solve the issue. > Yet, I see 3 as a correct solution. Can't we just abstain from > HOLD_INTERRUPTS() if taken LWLock is not exclusive? Hmm, the code in LWLockAcquire says /* * Lock out cancel/die interrupts until we exit the code section protected * by the LWLock. This ensures that interrupts will not interfere with * manipulations of data structures in shared memory. */ HOLD_INTERRUPTS(); which means if we want to change this, we would have to inspect every single use of LWLocks in shared mode in order to be certain that such a change isn't problematic. This is a discussion I'm not prepared for. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Si quieres ser creativo, aprende el arte de perder el tiempo"