On 2015-09-22 14:52:49 -0400, Robert Haas wrote: > 1. It would be possible to write a patch that included ONLY the > changes needed to make that happen, and did nothing else. It would be > largely a subset of this. If we want to change 9.3 and 9.4, I > recommend we do that first, and then come back to the rest of this.
I think that patch would be pretty much what I wrote. To be correct you basically have to: 1) Never skip a truncation on the standby. Otherwise there might have already have been wraparound and you read the completely wrong offset. 2) Always perform truncations on the standby exactly the same moment (in the replay stream) as on the primary. Otherwise there also can be a wraparound. 3) Never read anything from an SLRU from the data directory while inconsistent. In an inconsistent state we can read completely wrong data. A standby can be inconsistent in many situations, including crashes, restarts and fresh base backups. To me these three together leave only the option to never read an SLRUs contents on a standby. That only leaves minor changes in the patch that could be removed afaics. I mean we could leave in DetermineSafeOldestOffset() but it'd be doing pretty much the same as SetOffsetVacuumLimit(). I think we put at least three layers on bandaid on this issue since 9.3.2, and each layer made things more complicated. We primarily did so because of the compatibility and complexity concerns. I think that was a bad mistake. We should have done it mostly right back then, and we'd be better of now. If we continue with bandaids on the back branches while having a fixed 9.5+ with significantly different behaviour we'll have a hellish time fixing things in the back branches. And introduce more bugs than this might introduce. > 2. I agree that what we're doing right now is wrong. And I agree that > this fixes a real problem. But it seems to me to be quite possible, > even likely, that it will create other problems. Possible. But I think those bugs will be just bugs and not more fundamental architectural problems. To be very clear. I'm scared of the idea of backpatching this. I'm more scared of doing that myself. But even more I am scared of the current state. > For example, suppose that there are files in the data directory that > precede oldestMultiXact. In the current approach, we'll remove those > because they're not in the range we expect to be used. Hm. For offsets/ we continue to use SimpleLruTruncate() for truncation, which scans the directory, so I don't see a problem. For members/ we won't - but neither do we really today, see SlruScanDirCbRemoveMembers(). So I don't think there'll be a significant difference? > a leftover old file that doesn't get removed the first time through - > for whatever reason - becomes a time bomb that will explode on the > next wraparound. I don't know that that will happen. We should be able to deal with that, otherwise recovery is pretty borked. It can be a problem for the 'recovery from wrong oldest multi' case, but that's the same today. > I will bet you a beer that there are other possible hazards neither of > us is foreseeing right now. Right. I'm not dismissing that. I just think it's much more likely to be handleable problems than the set we have today. It's incredibly hard to get an accurate mental model of the combined behaviour & state of primary and standby today. Even if we three have that today, I'm pretty sure we won't in half a year. And sure as hell nearly nobody else will have one. > >> - If SlruDeleteSegment fails in unlink(), shouldn't we at the very > >> least log a message? If that file is still there when we loop back > >> around, it's going to cause a failure, I think. > > > > The existing unlink() call doesn't, that's the only reason I didn't add > > a message there. I'm fine with adding a (LOG or WARNING?) message. > > Cool. Hm. When redoing a truncation during [crash] recovery that can cause a host of spurious warnings if already done before. DEBUG1 to avoid scaring users? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers