Commit to primary with unavailable sync standby

Andrey Borodin Thu, 19 Dec 2019 03:05:19 -0800

Hi!

I cannot figure out proper way to implement safe HA upsert. I will be very 
grateful if someone would help me.


Imagine we have primary server after failover. It is network-partitioned. We 
are doing INSERT ON CONFLICT DO NOTHING; that eventually timed out.

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
    INSERT INTO t(
        pk,
        v,
        dt
    )
    VALUES
    (
        5,
        'text',
        now()
    )
    ON CONFLICT (pk) DO NOTHING
    RETURNING pk,
              v,
              dt)
   SELECT new_doc.pk from new_doc;
^CCancel request sent
WARNING:  01000: canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
LOCATION:  SyncRepWaitForLSN, syncrep.c:264
Time: 2173.770 ms (00:02.174)

Here our driver decided that something goes wrong and we retry query.

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
    INSERT INTO t(
        pk,
        v,
        dt
    )
    VALUES
    (
        5,
        'text',
        now()
    )
    ON CONFLICT (pk) DO NOTHING
    RETURNING pk,
              v,
              dt)
   SELECT new_doc.pk from new_doc;
 pk
----
(0 rows)

Time: 4.785 ms

Now we have split-brain, because we acknowledged that row to client.
How can I fix this?

There must be some obvious trick, but I cannot see it... Or maybe cancel of 
sync replication should be disallowed and termination should be treated as 
system failure?

Best regards, Andrey Borodin.

Commit to primary with unavailable sync standby

Reply via email to