On Tue, Apr 9, 2024 at 7:24 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > I think it's a bit more nuanced, because it's about backups/restore. The > bug might be subtle, and you won't learn about it until the moment when > you need to restore (or perhaps even long after that). At which point > "You might have taken the backup in some other way." is not really a > viable exit route. > > Anyway, I'm still not worried about this particular feature, and I'll > keep doing the stress testing.
In all sincerity, I appreciate the endorsement. Basically what's been scaring me about this feature is the possibility that there's some incurable design flaw that I've managed to completely miss. If it has some more garden-variety bugs, that's still pretty bad: people will potentially lose data and be unable to get it back. But, as long as we're able to find the bugs and fix them, the situation should improve over time until, hopefully, everybody trusts it roughly as much as we trust, say, crash recovery. Perhaps even a bit more: I think this code is much better-written than our crash recovery code, which has grown into a giant snarl that nobody seems able to untangle, despite multiple refactoring attempts. However, if there's some reason why the approach is fundamentally unsound which I and others have failed to detect, then we're at risk of shipping a feature that is irretrievably broken. That would really suck. I'm fairly hopeful that there is no such design defect: I certainly can't think of one. But, it's much easier to imagine an incurable problem here than with, say, the recent pruning+freezing changes. Those changes might have bugs, and those bugs might be hard to find, but if they do exist and are found, they can be fixed. Here, it's a little less obvious that that's true. We're relying on our ability, at incremental backup time, to sort out from the manifest and the WAL summaries, what needs to be included in the backup in order for a subsequent pg_combinebackup operation to produce correct results. The basic idea is simple enough, but the details are complicated, and it feels like a subtle defect in the algorithm could potentially scuttle the whole thing. I'd certainly appreciate having more smart people try to think of things that I might have overlooked. -- Robert Haas EDB: http://www.enterprisedb.com