Re: improve performance of pg_dump --binary-upgrade

Daniel Gustafsson Thu, 18 Apr 2024 00:25:24 -0700

> On 18 Apr 2024, at 06:17, Nathan Bossart <[email protected]> wrote:


> The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade'
> for this case.  Instead of executing the query in every call to the
> function, we can execute it once during the first call and store all the
> required information in a sorted array that we can bsearch() in future
> calls.

That does indeed seem like a saner approach.  Since we look up the relkind we
can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
since we already know that without the caller telling us?

> One downside of this approach is the memory usage.

I'm not too worried about the worst-case performance of this.

> This was more-or-less
> the first approach that crossed my mind, so I wouldn't be surprised if
> there's a better way.  I tried to keep the pg_dump output the same, but if
> that isn't important, maybe we could dump all the pg_class OIDs at once
> instead of calling binary_upgrade_set_pg_class_oids() for each one.

Without changing the backend handling of the Oid's we can't really do that
AFAICT, the backend stores the Oid for the next call so it needs to be per
relation like now?

For Greenplum we moved this to the backend by first dumping all Oids which were
read into backend cache, and during relation creation the Oid to use was looked
up in the backend.  (This wasn't a performance change, it was to allow multiple
shared-nothing clusters to have a unified view of Oids, so I never benchmarked
it all that well.) The upside of that is that the magic Oid variables in the
backend can be removed, but it obviously adds slight overhead in others.

--
Daniel Gustafsson

Re: improve performance of pg_dump --binary-upgrade

Reply via email to