On Wed, Jun 16, 2021 at 6:13 PM Andres Freund <and...@anarazel.de> wrote: > I don't think the main issue is the speed of checkpointing itself? The reaoson > to maintain the old paths is that the "new approach" is bloating WAL volume, > no? Right now cloning a 1TB database costs a few hundred bytes of WAL and > about > 1TB of write IO. With the proposed approach, the write volume approximately > doubles, because there'll also be about 1TB in WAL.
This is a good point, but on the other hand, I think this smells a lot like the wal_level=minimal optimization where we don't need to log data being bulk-loaded into a table created in the same transaction if wal_level=minimal. In theory, that optimization has a lot of value, but in practice it gets a lot of bad press on this list, because (1) sometimes doing the fsync is more expensive than writing the extra WAL would have been and (2) most people want to run with wal_level=replica/logical so it ends up being a code path that isn't used much and is therefore more likely than average to have bugs nobody's terribly interested in fixing (except Noah ... thanks Noah!). If we add features in the future, lke TDE or perhaps incremental backup, that rely on new pages getting new LSNs instead of recycled ones, this may turn into the same kind of wart. And as with that optimization, you're probably not even better off unless the database is pretty big, and you might be worse off if you have to do fsyncs or flush buffers synchronously. I'm not severely opposed to keeping both methods around, so if that's really what people want to do, OK, but I guess I wonder whether we're really going to be happy with that decision down the road. -- Robert Haas EDB: http://www.enterprisedb.com