Noah Misch <n...@leadboat.com> writes: > On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote: >> 1. It seems like this discussion is conflating two different issues.
> When the newest XID and the oldest XID fall in "opposite" segments in the XID > space, we must not unlink the segment containing the newest XID. That is the > chief goal at present. Got it. Thanks for clarifying the scope of the patch. >> 3. It feels like the proposed test of cutoff position against both >> ends of a segment is a roundabout way of fixing the problem. I >> wonder whether we ought not pass *both* the cutoff and the current >> endpoint (latest_page_number) down to the truncation logic, and >> have it compare against both of those values. > Since latest_page_number can keep changing throughout SlruScanDirectory() > execution, that would give a false impression of control. Better to > demonstrate that the xidWrapLimit machinery keeps latest_page_number within > acceptable constraints than to ascribe significance to a comparison with a > stale latest_page_number. Perhaps. I'm prepared to accept that line of argument so far as the clog SLRU goes, but I'm not convinced that the other SLRUs have equally robust defenses against advancing too far. So on the whole I'd rather that the SLRU logic handled this issue strictly on the basis of what it knows, without assumptions about what calling code may be doing. Still, maybe we only really care about the risk for the clog SLRU? >> To try to clarify this in my head, I thought about an image of >> the modular XID space as an octagon, where each side would correspond to >> a segment file if we chose numbers such that there were only 8 possible >> segment files. > Exactly. > https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I > uses your octagon to show the behaviors before and after this patch. Cool, thanks for drafting that up. (My original sketch was not of publishable quality ;-).) To clarify, the upper annotations probably ought to read "nextXid <= xidWrapLimit"? And "cutoffPage" ought to be affixed to the orange dot at lower right of the center image? I agree that this diagram depicts why we have a problem right now, and the right-hand image shows what we want to have happen. What's a little less clear is whether the proposed patch achieves that effect. In particular, after studying this awhile, it seems like removal of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT" adjustment isn't really affecting anything. It's already the case that just by allowing oldestXact to get rounded back to an SLRU page boundary, we've created some daylight between oldestXact and the cutoff point. Rounding back further within the same SLRU segment changes nothing. (For example, suppose that oldestXact is already within the oldest page of its SLRU segment. Then either rounding rule has the same effect. But there's still a little bit of room for xidWrapLimit to be in the opposite SLRU segment, allowing trouble.) So I think what we're actually trying to accomplish here is to ensure that instead of deleting up to half of the SLRU space before the cutoff, we delete up to half-less-one-segment. Maybe it should be half-less-two-segments, just to provide some cushion against edge cases. Reading the first comment in SetTransactionIdLimit makes one not want to trust too much in arguments based on the exact value of xidWrapLimit, while for the other SLRUs it was already unclear whether the edge cases were exactly right. In any case, it feels like the specific solution you have here, of testing both ends of the segment, is a roundabout way of providing that one-segment slop; and it doesn't help if we decide we need two-segment slop. Can we write the test in a way that explicitly provides N segments of slop? regards, tom lane