Re: parallel pg_restore blocks on heavy random read I/O on all children processes

Dimitrios Apostolou Sun, 23 Mar 2025 05:53:49 -0700

On Thu, 20 Mar 2025, Tom Lane wrote:

I am betting that the problem is that the dump's TOC (table of
contents) lacks offsets to the actual data of the database objects,
and thus the readers have to reconstruct that information by scanning
the dump file.  Normally, pg_dump will back-fill offset data in the
TOC at completion of the dump, but if it's told to write to an
un-seekable output file then it cannot do that.


Thanks Tom, this makes sense! As you noticed, I'm piping the output, and
this was a conscious choice.

I don't see an easy way, and certainly no way that wouldn't involve
redefining the archive format.  Can you write the dump to a local
file rather than piping it immediately?


Unfortunately I don't have enough space for that. I'm still testing, but
the way this is designed to work is to take an uncompressed pg_dump
(unlike the above which was compressed for testing purposes) and send it
to a backup server having its own deduplication and compression.

Further questions:

* Does the same happen in an uncompressed dump? Or maybe the offsets are
  pre-filled because they are predictable without compression?

* Should pg_dump print some warning for generating a lower quality format?

* The seeking pattern in pg_restore seems non-sensical to me: reading 4K,
  jumping 8-12K, repeat for the whole file? Consuming 15K IOPS for an
  hour. /Maybe/ something to improve there... Where can I read more about
  the format?

* Why doesn't it happen in single-process pg_restore?


Thank you!
Dimitris

Re: parallel pg_restore blocks on heavy random read I/O on all children processes

Reply via email to