On 12/21/2017 06:15 PM, Craig Ringer wrote:
On 22 December 2017 at 05:24, Joshua D. Drake <j...@commandprompt.com <mailto:j...@commandprompt.com>> wrote:

    -Hackers,


    Lastly, I noted that a full sync of a replication set is performed
    by a COPY, this is fine for small sets but if we have a large data
    set that may take some time it may be a problem with overall
    performance and maintenance. We may want to see if we can do an
    initial sync incrementally (optional) via a cursor (?) and queue
    all changed rows until the sync completes?


I'm not sure I understand this.

The COPY is streamed from source to destination, IIRC it's not buffering to a tempfile or anything. So I fail to see what using a cursor would gain you. No matter whether you're using a cursor, a COPY, or something else, you have to hold down a specific xmin and work with the same snapshot for the whole sync operation. If you instead did something like incremental SELECTs, each with a new xmin+snapshot, across ranges of a PK your copy would see changes from different points in time depending on where in the copy it was up to, and you'd get an inconsistent view. It could possibly be worked around with some tricky key-range-based filtering of the applied change-stream if you were willing to require that no PK updates may occur, but it'd probably be bug city. It's hard enough to get sync correct at all.

I am not sure that this is entirely true. Granted it is easiest just to do everything within a snapshot but we shouldn't have to. It would be possible to perform incremental (even parallel) syncs whether copy or other mechanism. We would have to track changes to the table as we sync but that isn't impossible either (especially if we have a PK). I would think that this would only be valid within async replication but it is possible. We just queue/audit the changes as they happen and sync up the changes after the initial sync completes. Multi-phase sync baby :D

Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****


Reply via email to