Re: [HACKERS] Error while copying a large file in pg_rewind

Michael Paquier Tue, 04 Jul 2017 21:06:07 -0700

On Tue, Jul 4, 2017 at 4:41 PM, Kuntal Ghosh <[email protected]> wrote:
> I've not yet started the patch and it may take some time for me to
> understand and write
> the patch in a correct way. Since, you've almost written the patch,
> IMHO, please go ahead
> and submit the patch. I'll happily review and test it. :-)
>
> Thanks for the notes.


OK, thanks. Here you go.

Upon further testing, I have discovered as well that
libpqProcessFileList suffers an overflow on the file size when it is
higher than 2GB, which caused only a portion of the relation blocks to
be copied when doing a transfer with libpq in my test case. That could
be the origin of all kind of corruptions, with my test case a
sequential scan resulted in inconsistent data fetched, which in this
case was only a portion of the tuples scannable on the rewound standby
because the original blocks remained in place, and those had a LSN set
to a position *newer* than what the master was generating. When doing
an offline rewind, things are a bit smarter and no such corruptions
are possible. But that's just the top of the iceberg for such issues.
Big files completely copied were actually able to do a double
overflow, which made the transfer able to work correctly, but that was
plain luck.

The patch attached passes my test case where blocks from a large
relation file are copied as well as when a large raw file is copied.
This needs to be back-patched down to 9.5, since pg_rewind has been
introduced.
-- 
Michael

rewind-large-files.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error while copying a large file in pg_rewind

Reply via email to