Thanks Tom. I suppose "pg_dump can only parallelize data dumping" answers my 
original question as "expected behavior" but I would like to understand the 
reason better.

My knowledge of Postgres and other DBMSs is at casual admin level with the 
occasional deep dive on specific errors or analysis. I'm not averse to getting 
into the code. Before my OP I searched for reasons that the schema-only option 
would prevent pg_dump from being able to run multiple jobs and didn't find 
anything that I understood to confirm either way.

Is the limitation simply the state of development to date or is there something 
about dumping the schemas that conflicts with paralleling? I'm willing to do 
some studying if provided links to relevant articles.

The --link option to pg_upgrade would be so much more useful if it weren't 
still bound to serially dumping the schemas of half a million tables. As 
already mentioned, if there is an alternate process that mimics pg_upgrade but 
allows for paralleling, I'm open to that.

Thanks all

________________________________________
From: Tom Lane <t...@sss.pgh.pa.us>
Sent: Saturday, April 6, 2019 3:02 PM
To: senor
Cc: pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

senor <frio_cerv...@hotmail.com> writes:
> Since pg_upgrade is in control of how it is calling pg_dump, is there a 
> reason pg_upgrade cannot use the directory output format when calling 
> pg_dump? Is the schema-only operation incompatible?

Well, there's no point in it.  pg_dump can only parallelize data dumping,
and there's none to be done in the --schema-only case that pg_upgrade
uses.

Also, since pg_upgrade *does* use parallelism across multiple pg_dump
calls (if you've got multiple databases in the cluster), it'd be a bit
problematic to have another layer of parallelism below that, if it did
indeed do anything.  You don't want "--jobs=10" to suddenly turn into
100 sessions.

                        regards, tom lane


Reply via email to