On Thu, Jun 17, 2021 at 3:43 AM Andres Freund <and...@anarazel.de> wrote:
> > Yeah, that would also work, but I thought since we are already > > avoiding the checkpoint so let's avoid FlushDatabaseBuffers() also and > > directly use the lower level buffer manager API which doesn't need > > recache. And I am using pg_class to identify the useful relfilenode > > so that we can avoid processing some unwanted relfilenode but yeah I > > agree that this is orthogonal to whether we use checkpoint or not. > > It's not entirely obvious to me that it's important to avoid > FlushDatabaseBuffers() on its own. Forcing a checkpoint is problematic because > it unnecessarily writes out dirty buffers in other databases, triggers FPWs > etc. Normally a database used as a template won't have a meaningful amount of > dirty buffers itself, so the FlushDatabaseBuffers() shouldn't trigger a lot of > writes. Of course, there is the matter of FlushDatabaseBuffers() not being > cheap with a large shared_buffers - but I suspect that's not a huge factor > compared to the rest of the database creation cost. Okay so if I think from that POW, then maybe we can just FlushDatabaseBuffers() and then directly use smgrread() calls. > I think the better argument for going through shared buffers is that it might > be worth doing so for the *target* database. A common use of frequently > creating databases, in particular with a non-default template database, is to > run regression tests with pre-created schema / data - writing out all that > data > just to have it then dropped a few seconds later after the regression test > completed is wasteful. Okay, I am not sure how common this use case is but for this use case it makes sense to use bufmgr for the target database. > > > In principle, we could have both mechanisms, and use the new WAL-logged > > > system if the database is small, and the old system with checkpoints if > > > it's large. But I don't like idea of having to maintain both. > > > > Yeah, I agree in some cases, where we don't have many dirty buffers, > > checkpointing can be faster. > > I don't think the main issue is the speed of checkpointing itself? The reaoson > to maintain the old paths is that the "new approach" is bloating WAL volume, > no? Right now cloning a 1TB database costs a few hundred bytes of WAL and > about > 1TB of write IO. With the proposed approach, the write volume approximately > doubles, because there'll also be about 1TB in WAL. Make sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com