Noah Misch <n...@leadboat.com> writes: > On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote: >> It's a bit unfortunate that a patch for a data corruption / loss issue >> (even if a low-probability one) fell through multiple commitfests.
> Thanks for investing in steps to fix that. Yeah, this patch has been waiting in the queue for way too long :-(. I spent some time studying this, and I have a few comments/questions: 1. It seems like this discussion is conflating two different issues. The original complaint was about a seemingly-bogus log message "could not truncate directory "pg_xact": apparent wraparound". Your theory about that, IIUC, is that SimpleLruTruncate's initial round-back of cutoffPage to a segment boundary moved us from a state where shared->latest_page_number doesn't logically precede the cutoffPage to one where it does, triggering the message. So removing the initial round-back, and then doing whatever's needful to compensate for that in the later processing, is a reasonable fix to prevent the bogus warning. However, you're also discussing whether or not an SLRU segment file that is close to the wraparound boundary should get removed or not. As far as I can see that's 100% independent of issuance of the log message, no? This might not affect the code substance of the patch at all; but it seems like we need to be clear about it in our discussion, and maybe the comments need to change too. 2. I wonder whether we have an issue even with rounding back to the SLRU page boundary, as is done by each caller before we ever get to SimpleLruTruncate. I'm pretty sure that the actual anti-wraparound protections are all precise to the XID level, so that there's some daylight between what SimpleLruTruncate is told and the range of data that the higher-level code thinks is of interest. Maybe we should restructure things so that we keep the original cutoff XID (or equivalent) all the way through, and compare the start/end positions of a segment file directly to that. 3. It feels like the proposed test of cutoff position against both ends of a segment is a roundabout way of fixing the problem. I wonder whether we ought not pass *both* the cutoff and the current endpoint (latest_page_number) down to the truncation logic, and have it compare against both of those values. To try to clarify this in my head, I thought about an image of the modular XID space as an octagon, where each side would correspond to a segment file if we chose numbers such that there were only 8 possible segment files. Let's say that nextXID is somewhere in the bottommost octagon segment. The oldest possible value for the truncation cutoff XID is a bit less than "halfway around" from nextXID; so it could be in the topmost octagon segment, if the minimum permitted daylight- till-wraparound is less than the SLRU segment size (which it is). Then, if we round the cutoff XID "back" to a segment boundary, most of the current (bottommost) segment is now less than halfway around from that cutoff point, and in particular the current segment's starting page is exactly halfway around. Because of the way that TransactionIdPrecedes works, the current segment will be considered to precede that cutoff point (the int32 difference comes out as exactly 2^31 which is negative). Boom, data loss, because we'll decide the current segment is removable. I think that your proposed patch does fix this, but I'm not quite sold that the corner cases (where the cutoff XID is itself exactly at a page boundary) are right. In any case, I think it'd be more robust to be comparing explicitly against a notion of the latest in-use page number, instead of backing into it from an assumption that the cutoff XID itself is less than halfway around. I wonder if we ought to dodge the problem by having a higher minimum value of the required daylight-before-wraparound, so that the cutoff point couldn't be in the diametrically-opposite-current segment but would have to be at least one segment before that. In the end, I believe that all of this logic was written under an assumption that we should never get into a situation where we are so close to the wraparound threshold that considerations like these would manifest. Maybe we can get it right, but I don't have huge faith in it. It also bothers me that some of the callers of SimpleLruTruncate have explicit one-count backoffs of the cutoff point and others do not. There's no obvious reason for the difference, so I wonder if that isn't something we should have across-the-board, or else adjust SimpleLruTruncate to do the equivalent thing internally. I haven't thought much yet about your second point about race conditions arising from nextXID possibly moving before we finish the deletion scan. Maybe we could integrate a fix for that issue, along the lines of (1) see an SLRU segment file, (2) determine that it appears to precede the cutoff XID, if so (3) acquire the control lock and fetch latest_page_number, compare against that to verify that the segment file is old and not new, then (4) unlink if that still holds. regards, tom lane