On Wed, 5 Jun 2024 at 18:47, Ranier Vilela <ranier...@gmail.com> wrote: > > Em ter., 4 de jun. de 2024 às 16:39, Nathan Bossart > <nathandboss...@gmail.com> escreveu: >> >> I noticed that the "Restoring database schemas in the new cluster" part of >> pg_upgrade can take a while if you have many databases, so I experimented >> with a couple different settings to see if there are any easy ways to speed >> it up. The FILE_COPY strategy for CREATE DATABASE helped quite >> significantly on my laptop. For ~3k empty databases, this step went from >> ~100 seconds to ~30 seconds with the attached patch. I see commit ad43a41 >> made a similar change for initdb, so there might even be an argument for >> back-patching this to v15 (where STRATEGY was introduced). One thing I >> still need to verify is that this doesn't harm anything when there are lots >> of objects in the databases, i.e., more WAL generated during many >> concurrent CREATE-DATABASE-induced checkpoints. >> >> Thoughts? > > Why not use it too, if not binary_upgrade?
Because in the normal case (not during binary_upgrade) you don't want to have to generate 2 checkpoints for every created database, especially not when your shared buffers are large. Checkpoints' costs scale approximately linearly with the size of shared buffers, so being able to skip those checkpoints (with strategy=WAL_LOG) will save a lot of performance in the systems where this performance impact matters most. >> I noticed that the "Restoring database schemas in the new cluster" part of >> pg_upgrade can take a while if you have many databases, so I experimented >> with a couple different settings to see if there are any easy ways to speed >> it up. The FILE_COPY strategy for CREATE DATABASE helped quite >> significantly on my laptop. For ~3k empty databases, this step went from >> ~100 seconds to ~30 seconds with the attached patch. As for "on my laptop", that sounds very reasonable, but could you check the performance on systems with larger shared buffer configurations? I'd imagine (but haven't checked) that binary upgrade target systems may already be using the shared_buffers from their source system, which would cause a severe regression when the to-be-upgraded system has large shared buffers. For initdb the database size is known in advance and shared_buffers is approximately empty, but the same is not (always) true when we're doing binary upgrades. Kind regards, Matthias van de Meent