Hi, Currently docs about pg_upgrade says:
""" <para> The <option>--jobs</option> option allows multiple CPU cores to be used for copying/linking of files and to dump and reload database schemas in parallel; a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine. </para> """ Which make the user think that the --jobs option could use all CPU cores. Which is not true. Or that it has anything to do with multiple databases, which is true only to some extent. What that option really improves are upgrading servers with multiple tablespaces, of course if --link or --clone are used pg_upgrade is still very fast but used with the --copy option is not what one could expect. As an example, a customer with a 25Tb database, 40 cores and lots of ram used --jobs=35 and got only 7 processes (they have 6 tablespaces) and the disks where not used at maximum speed either. They expected 35 processes copying lots of files at the same time. So, first I would like to improve documentation. What about something like the attached? Now, a couple of questions: - in src/bin/pg_upgrade/file.c at copyFile() we define a buffer to determine the amount of bytes that should be used in read()/write() to copy the relfilenode segments. And we define it as (50 * BLCKSZ), which is 400Kb. Isn't this too small? - why we read()/write() at all? is not a faster way of copying the file? i'm asking that because i don't actually know. I'm trying to add more parallelism by copying individual segments of a relfilenode in different processes. Does anyone one see a big problem in trying to do that? I'm asking because no one did it before, that could not be a good sign. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 20efdd7..74eaaee 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -406,10 +406,10 @@ NET STOP postgresql-&majorversion; <para> The <option>--jobs</option> option allows multiple CPU cores to be used for copying/linking of files and to dump and reload database schemas - in parallel; a good place to start is the maximum of the number of - CPU cores and tablespaces. This option can dramatically reduce the - time to upgrade a multi-database server running on a multiprocessor - machine. + in parallel; a good place to start is the maximum of: the number of + CPU cores or tablespaces. This option can dramatically reduce the + time to upgrade a server with multiple tablespaces running on a + multiprocessor machine. </para> <para>