> 29 июня 2021 г., в 23:35, Jeff Davis <pg...@j-davis.com> написал(а):
> 
> On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote:
>>> 29 июня 2021 г., в 03:56, Jeff Davis <pg...@j-davis.com>
>>> написал(а):
>>> 
>>> The patch may be somewhat controversial, so I'll wait for feedback
>>> before documenting it properly.
>> 
>> The patch seems similar to [0]. But I like your wording :)
>> I'd be happy if we go with any version of these idea.
> 
> Thank you, somehow I missed that one, we should combine the CF entries.
> 
> My patch also covers the backend termination case. Is there a reason
> you left that case out?
Yes, backend termination is used by HA tool before rewinding the node. 
Initially I was considering termination as PANIC and got a ton of coredumps 
during failovers on drills.

There is one more caveat we need to fix: we should prevent instant recovery 
from happening. HA tool must know that our process was restarted. 
Consider following scenario:
1. Node A is primary with sync rep.
2. A is going through network partitioning, somewhere node B is promoted.
3. All backends of A are stuck in sync rep, until HA tool discovers A is failed 
node.
4. One backend crashes with segfault in some buggy extension or OOM or whatever
5. Postgres server is doing restartless crash recovery making 
local-but-not-replicated data visible.

We should prevent 5 also as we prevent cancels. HA tool will discover 
postmaster fail and will recheck in coordinatino system that it can raise up 
Postgres locally.

Thanks!

Best regards, Andrey Borodin.

Reply via email to