Re: Avoiding data loss with synchronous replication

Andrey Borodin Sat, 24 Jul 2021 03:53:47 -0700

> 23 июля 2021 г., в 22:54, Bossart, Nathan <bossa...@amazon.com> написал(а):
> 
> On 7/23/21, 4:33 AM, "Andrey Borodin" <x4...@yandex-team.ru> wrote:
>> Thanks for you interest in the topic. I think in the thread [0] we almost 
>> agreed on general design.
>> The only left question is that we want to threat pg_ctl stop and kill 
>> SIGTERM differently to pg_terminate_backend().
> 
> I didn't get the idea that there was a tremendous amount of support
> for the approach to block canceling waits for synchronous replication.
> FWIW this was my initial approach as well, but I've been trying to
> think of alternatives.
> 
> If we can gather support for some variation of the block-cancels
> approach, I think that would be preferred over my proposal from a
> complexity standpoint.  
Let's clearly enumerate problems of blocking.
It's been mentioned that backend is not responsive when cancelation is blocked. 
But on the contrary, it's very responsive.

postgres=# alter system set synchronous_standby_names to 'bogus';
ALTER SYSTEM
postgres=# alter system set synchronous_commit_cancelation TO off ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
2021-07-24 15:35:03.054 +05 [10452] LOG:  received SIGHUP, reloading 
configuration files
l 
---
t
(1 row)
postgres=# begin;
BEGIN
postgres=*# insert into t1 values(0);
INSERT 0 1
postgres=*# commit ;
^CCancel request sent
WARNING:  canceling wait for synchronous replication requested, but cancelation 
is not allowed
DETAIL:  The COMMIT record has already flushed to WAL locally and might not 
have been replicated to the standby. We must wait here.
^CCancel request sent
WARNING:  canceling wait for synchronous replication requested, but cancelation 
is not allowed
DETAIL:  The COMMIT record has already flushed to WAL locally and might not 
have been replicated to the standby. We must wait here.

It tells clearly what's wrong. If it's still not enough, let's add hint about 
synchronous standby names.

Are there any other problems with blocking cancels?


> Robert's idea to provide a way to understand
> the intent of the cancellation/termination request [0] could improve
> matters.  Perhaps adding an argument to pg_cancel/terminate_backend()
> and using different signals to indicate that we want to cancel the
> wait would be something that folks could get on board with.

Semantics of cancelation assumes correct query interruption. This is not 
possible already when we committed locally. There cannot be any correct 
cancelation. And I don't think it worth to add incorrect cancelation.


Interestingly, converting transaction to 2PC is a neat idea when the backend is 
terminated. It provides more guaranties that transaction will commit correctly 
even after restart. But we may be short of max_prepared_xacts slots...
Anyway backend termination bothers me a lot less than cancelation - drivers do 
not terminate queries on their own. But they cancel queries by default.


Thanks!

Best regards, Andrey Borodin.
Re: Avoiding data loss with synchronous replication

Reply via email to