On 05/01/18 05:35, Joshua D. Drake wrote:
On 01/04/2018 01:26 PM, Alvaro Herrera wrote:
Joshua D. Drake wrote:

We just queue/audit the changes as they happen and sync up the changes
after the initial sync completes.
This already happens.  There is an initial sync, and there's logical
decoding that queues any changes that exist "after" the sync's snapshot.

What you seem to want is to have multiple processes doing the initial
COPY in parallel -- each doing one fraction of the table.  Of course,
they would have to use the same snapshot.  That would make sense only
if the COPY itself is the bottleneck and not the network, or the I/O
speed of the origin server.  This doesn't sound a common scenario to me.

Not quite but close. My thought process is that we don't want to sync within a single snapshot a 100-500mil row table (or worse). Unless I am missing something there, that has the potential to be a very long running transaction especially if we are syncing more than one relation.

JD


    That's indeed the way it works, you need to hold the snapshot possibly for a long time. But not doing so seems to go a very complex, even though it's not impossible. Changes after initial sync are definitely registered (via logical decoding), that's not an issue. But if you don't keep a snapshot of the database, you will also see some or all of these changes applied to the tables mid-way. How to make the whole table copy consistent with potential mid-way changes and the recorded ones on logical decoding is difficult and bug-prone.

    Surprisingly, this is how MongoDB replication works, as they don't have the equivalent of a snapshot facility. But actually they need to do really weird stuff, like re-applying changes up to 3 (why?) times and comments on the source code point to strange hacks to make all consistent. I (want to) believe they made it correctly, but it is hacky, complicated, and MongoDB doesn't support FKs and other features that I'm sure complicate matters even more.

    I'm not a PG hacker, but all this sounds too complicated to me. I'd keep the snapshot open that makes things very easy. If inside you want to do parallel COPY, that's fine (if, as the other Álvaro said, it is COPY the limiting factor).


    Cheers,

    Álvaro

--

Alvaro Hernandez


-----------
OnGres


Reply via email to