On Wed, Jan 8, 2025 at 4:31 PM Michael Paquier <mich...@paquier.xyz> wrote: > On Thu, Dec 26, 2024 at 01:10:59PM +0300, Yura Sokolov wrote: > > Commit "Fix corruption when relation truncation fails." [0] makes > > smgrtruncate be called in a critical section. Unfortunately in version 13 it > > leads to occasional call to palloc0 inside of critical section, which > > triggers Assert in debug builds: > > > > - smgrtruncate calls mdtruncate > > - mdtruncate may call register_dirty_segment > > - register_dirty_segment calls RegisterSyncRequest > > - RegisterSyncRequest calls ForwardSyncRequest > > - ForwardSyncRequest may call CompactCheckpointerRequestQueue > > - CompactCheckpointerRequestQueue may call palloc0 ... > > > > In versions 14 and above CompactCheckpointerRequestQueue does check for > > critical section due to commit "Fix bugs in MultiXact truncation" [1], > > which were backported down to 14 version, but not 13. > > > > We caught it in our private tests, so it is real. > > Cherry-pick of [1] from 14 version to 13 solves the issue. > > > > [0] https://git.postgresql.org/gitweb/?p=postgresql.git;h=2280912165d > > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;h=4c8e00ae9ae > > Ah, you're right. This chain of events is possible in REL_13_STABLE, > and only there. Thomas?
Yeah. 4c8e00ae9ae addressed two separate issues: (1) another critical section thing just like this one, and this one in passing, and (2) a deadlock risk involving checkpoint delays. I had actually worried about that exact deadlock scenario a bit earlier[1] and figured out that it couldn't happen in this code path, so I guess we technically only need to back-patch the couple of lines with the fast return, but it seems harmless, simple and good to back-patch Heikki's whole patch from 14->13. I will study this a bit more and then go ahead and do that. Thanks for the report, Yura, and the reminder, Michael. [1] https://www.postgresql.org/message-id/CA%2BhUKGJC%2BEDcjXFoeOopcE7zxDwASUyWXh8RkRkgbq%3DvYTQ-qg%40mail.gmail.com