Noah Misch <n...@leadboat.com> writes: > On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote: >>> b. Arrange so only one backend runs vac_truncate_clog() at a time. Other >>> than >>> AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a >>> checkpoint, or in the startup process. Hence, also arrange for only one >>> backend to call SimpleLruTruncate(AsyncCtl) at a time.
>> More specifically, restrict vac_update_datfrozenxid() to one backend per >> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one >> backend per cluster. This, attached, was rather straightforward. > Rebased. The conflicts were limited to comments and documentation. I tried to review this, along with your adjacent patch to adjust the segment-roundoff logic, but I didn't get far. I see the point that the roundoff might create problems when we are close to hitting wraparound. I do not, however, see why serializing vac_truncate_clog helps. I'm pretty sure it was intentional that multiple backends could run truncation directory scans concurrently, and I don't really want to give that up if possible. Also, if I understand the data-loss hazard properly, it's what you said in the other thread: the latest_page_number could advance after we make our decision about what to truncate, and then maybe we could truncate newly-added data. We surely don't want to lock out the operations that can advance last_page_number, so how does serializing vac_truncate_clog help? I wonder whether what we need to do is add some state in shared memory saying "it is only safe to create pages before here", and make SimpleLruZeroPage or its callers check that. The truncation logic would advance that value, but only after completing a scan. (Oh ..., hmm, maybe the point of serializing truncations is to ensure that we know that we can safely advance that?) Can you post whatever script you used to detect/reproduce the problem in the first place? regards, tom lane PS: also, re-reading this code, it's worrisome that we are not checking for failure of the unlink calls. I think the idea was that it didn't really matter because if we did re-use an existing file we'd just re-zero it; hence failing the truncate is an overreaction. But probably some comments about that are in order.