On Thu, Aug 6, 2020 at 6:08 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > +1 for making this more like what happens in original execution ("on the > primary", to use your wording). Perhaps what you suggest here is still > not enough like the original execution, but it sounds closer.
It won't be the same as the original execution, exactly -- I am only thinking of holding on to same-level page locks (the original page, its new right sibling, and the original right sibling). I suppose that it's possible to go further than this in one rarer case (when clearing incomplete split flag one level down), but for the most part it isn't even possible to follow original execution's approach to locking in every detail. Clearly it's not okay for the startup process to hold buffer locks across replay of the first and second phase of a split, but that's what it would take to follow original execution 100% faithfully -- there are two WAL records involved. I am quite confident that there won't be any remaining problems provided we follow the original execution's approach to locking within each level of the tree -- that's enough. Anything that runs during recovery won't care about cross-level differences, aside from the obvious (scans may have to move right to recover from concurrent splits). > As the commit message for 3bbf668d explains, the initial situation for > all the replay code was that it executed by itself in crash recovery and > didn't need to bother with locks at all. I think that it did take some > locks even then, but that was because of code sharing with the primary > execution path rather than being something we wanted. Makes sense. -- Peter Geoghegan