Re: Synchronous commit behavior during network outage

Ondřej Žižka Thu, 20 May 2021 08:40:58 -0700

On 06/05/2021 06:09, Andrey Borodin wrote:

I could not understand your reasoning about 2 and 4 nodes. Can you please 
clarify a bit how 4 node setup can help prevent visibility of 
commited-locall-but-canceled transactions?

Hello Andrey,

The initial request (for us) was to have a geo cluster with 2 locationswhere would be possible to have 2 sync replicas even in case of failureof one location. This means to have 2 nodes in every location (4together). If one location fails completely (broken network connection),Patroni will choose the working location (5 node etcd in 3 locations toensure this).

In the initial state, there is 1 sync replica in each location and oneasync replica in each location using as a source the sync replica in itslocation.

Let's have the following initial situation:

1) Nodes pg11 and pg12 are in one location nodes pg21 and pg22 are inanother location.

2) Nodes pg11 and pg21 are in sync replica
3) Node pg12 is an async replica from pg11
4) Node pg22 is an async replica from pg21
5) Master is pg11.

When the commited-locally-but-canceled situation happens and there is aproblem only with node pg21 (not with the network between nodes), theasync replica pg12 will receive the local commit from pg11 just afterthe local commit on pg11 even if the cancellation happens. So there willbe a situation when the commit is present on both pg11 and pg12. If thepg11 fails, the transaction already exists on pg12 and this node will beselected as a new leader (latest LSN).

There is a period between the time it is committed and the time it willhave been sent to the async replica when we can lose data, but I expectthis in milliseconds (maybe less).

It will not prevent visibility but will ensure, that the data would notbe lost and in that case, data can be visible on the leader even if theyare not present on the sync replica because there is ensured thecontinuity of the data persistence in the async replica.


I hope I explained it understandably.

Regards
Ondrej

Re: Synchronous commit behavior during network outage

Reply via email to