On Fri, Feb 28, 2025 at 02:51:27PM -0600, Nathan Bossart wrote: > Cool. I appreciate the design feedback.
One other design point I wanted to bring up is whether we should bother generating a rollback script for the new "swap" mode. In short, I'm wondering if it would be unreasonable to say that, just for this mode, once pg_upgrade enters the file transfer step, reverting to the old cluster requires restoring a backup. I believe that's worth considering for the following reasons: * Anecdotally, I'm not sure I've ever actually seen pg_upgrade fail during or after file transfer, and I'm hoping to get some real data about that in the near future. Has anyone else dealt with such a failure? I suspect that failures during file transfer are typically due to OS crashes, power losses, etc., and hopefully those are rare. * I've spent quite some time trying to generate a portable script, but it's quite complicated and difficult to reason about its correctness. And I haven't even started on the Windows version. Leaving this part out would simplify the patch set quite a bit. * If we give up the idea of reverting to the old cluster, we also can avoid a bunch of intermediate fsync() calls which I only included to help reason about the state of the files in case you failed halfway through. This might not add up to much, but it's at least another area of simplification. Of course, rollback would still be possible, but you'd really need to understand what "swap" mode does behind the scenes to do so safely. In any case, I'm growing skeptical that a probably-not-super-well-tested script that extremely few people will need and fewer will use is worth the complexity. -- nathan