Thanks for reviewing Ondřej! > 26 апр. 2021 г., в 22:01, Ondřej Žižka <ondrej.zi...@stratox.cz> написал(а): > > Hello Andrey, > > I went through the thread for your patch and seems to me as an acceptable > solution... > > > The only case patch does not handle is sudden backend crash - Postgres will > > recover without a restart. > > We also use a HA tool (Patroni). If the whole machine fails, it will find a > new master and it should be OK. We use a 4 node setup (2 sync replicas and 1 > async from every replica). If there is an issue just with sync replica (async > operated normally) and the master fails completely in this situation, it will > be solved by Patroni (the async replica become another sync), but if it is > just the backend process, the master will not failover and changes will be > still visible... > > If the sync replica outage is temporal it will be solved itself when the node > will establish a replication slot again... If the outage is "long", Patroni > will remove the "old" sync replica from the cluster and the async replica > reading from the master would be new sync. So yes... In 2 node setup, this > can be an issue, but in 4 node setup, this seems to me like a solution. > The only situation I can imagine is a situation when the client connections > use a different network than the replication network and the replication > network would be down completely, but the client network will be up. In that > case, the master can be an "isolated island" and if it fails, we can lose the > changed data. It is, in fact, very common type of network partition.
> Is this situation also covered in your model: "transaction effects should not > be observable on primary until requirements of synchronous_commit are > satisfied." Yes. If synchronous_commit_cancelation = off, no backend crash occurs and HA tool does not start PostgreSQL service when in doubt that other primary may exists. > Do you agree with my thoughts? I could not understand your reasoning about 2 and 4 nodes. Can you please clarify a bit how 4 node setup can help prevent visibility of commited-locall-but-canceled transactions? I do not think we can classify network partitions as "temporal" and "long". Due to the distributed nature of the system network partitions are eternal and momentary. Simultaneously. And if the node A can access node B and node C, this neither implies B can access C, nor B can access A. > Maybe would be possible to implement it into PostgreSQL with a note in > documentation, that a multinode (>=3 nodes) cluster is necessary. PostgreSQL does not provide and fault detection and automatic failover. Documenting anything wrt failover is the responsibility of HA tool. Thanks! Best regards, Andrey Borodin.