Re: Synchronous commit behavior during network outage

Aleksander Alekseev Tue, 20 Apr 2021 09:24:05 -0700

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?



On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka <ondrej.zi...@stratox.cz> wrote:
>
> Hello all,
> I would like to know your opinion on the following behaviour I see for 
> PostgreSQL setup with synchronous replication.
>
> This behaviour happens in a special use case. In this use case, there are 2 
> synchronous replicas with the following config (truncated):
>
> - 2 nodes
> - synchronous_standby_names='*'
> - synchronous_commit=remote_apply
>
>
> With this setup run the following steps (LAN down - LAN between master and 
> replica):
> -----------------
> postgres=# truncate table a;
> TRUNCATE TABLE
> postgres=# insert into a values (1); -- LAN up, insert has been applied to 
> replica.
> INSERT 0 1
> Vypnu LAN na serveru se standby:
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
> from sync replica. In this situation cancel it (press CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> There will be warning that commit was performed only locally:
> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous 
> replication due to user request
> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already 
> committed locally, but might not have been replicated to the standby.
>
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
> from sync replica. In this situation cancel it (press CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> postgres=# insert into a values (2); --LAN down, waiting for sync replica, 
> second attempt, cancel it as well (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, 
> cancel it (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> UPDATE 2
> postgres=# update a set n=3 where n=2; -- run the same update,because data 
> from the previous attempt was commited on master, it is sucessfull, but no 
> changes
> UPDATE 0
> postgres=# select * from a;
>  n
> ---
>  1
>  3
>  3
> (3 rows)
> postgres=#
> ------------------------
>
> Now, there is only value 1 in the sync replica table (no other values), data 
> is not in sync. This is expected, after the LAN restore, data will come sync 
> again, but if the main/primary node will fail and we failover to replica 
> before the LAN is back up or the storage for this node would be destroyed and 
> data would not sync to replica before it, we will lose data even if the 
> client received successful commit (with a warning).
> From the synchronous_commit=remote_write level and "higher", I would expect, 
> that when the remote application (doesn't matter if flush, write or apply) 
> would not be applied I would not receive a confirmation about the commit 
> (even with a warning). Something like, if there is no commit from sync 
> replica, there is no commit on primary and if someone performs the steps 
> above, the whole transaction will not send a confirmation.
>
> This can cause issues if the application receives a confirmation about the 
> success and performs some follow-up steps e.g. create a user account and 
> sends a request to the mail system to create an account or create a VPN 
> account. If the scenario above happens, there can exist a VPN account that 
> does not have any presence in the central database and can be a security 
> issue.
>
> I hope I explained it sufficiently. :-)
>
> Do you think, that would be possible to implement a process that would solve 
> this use case?
>
> Thank you
> Ondrej



-- 
Best regards,
Aleksander Alekseev

Re: Synchronous commit behavior during network outage

Reply via email to