Hello again,

I traced the seeking-reading behaviour of parallel pg_restore inside
_skipData() when called from _PrintTocData(). Since most of today's I/O
devices (both rotating and solid state) can read 1MB faster sequentially
than it takes to seek and read 4KB, I tried the following change:

diff --git a/src/bin/pg_dump/pg_backup_custom.c
b/src/bin/pg_dump/pg_backup_custom.c
index 55107b20058..262ba509829 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -618,31 +618,31 @@ _skipLOs(ArchiveHandle *AH)
  * Skip data from current file position.
  * Data blocks are formatted as an integer length, followed by data.
  * A zero length indicates the end of the block.
 */
 static void
 _skipData(ArchiveHandle *AH)
 {
        lclContext *ctx = (lclContext *) AH->formatData;
        size_t          blkLen;
        char       *buf = NULL;
        int                     buflen = 0;

        blkLen = ReadInt(AH);
        while (blkLen != 0)
        {
-               if (ctx->hasSeek)
+               if (ctx->hasSeek && blkLen > 1024 * 1024)
                {
                        if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0)
                                pg_fatal("error during file seek: %m");
                }
                else
                {
                        if (blkLen > buflen)
                        {
                                free(buf);
                                buf = (char *) pg_malloc(blkLen);
                                buflen = blkLen;
                        }
                        if (fread(buf, 1, blkLen, AH->FH) != blkLen)
                        {
                                if (feof(AH->FH))


This simple change improves immensely (10x maybe, depends on the number of
workers) the offset-table building phase of the parallel backup.

A problem still remaining is that this offset-table building phase is done
on every worker process, which means that all workers scan almost in
parallel the whole archive. A more intrusive improvement would be to move
this phase to the parent process, before spawning the children.

What do you think?

Regards,
Dimitris


P.S. I also have a simple change that changes -j1 switch to mean "parallel
but with one worker process", that I did for debugging purposes. Not sure
if it is of interest here.


Reply via email to