> 2 июля 2021 г., в 10:59, Jeff Davis <pg...@j-davis.com> написал(а):
>
> On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
>>> My patch also covers the backend termination case. Is there a
>>> reason
>>> you left that case out?
>>
>> Yes, backend termination is used by HA tool before rewinding the
>> node.
>
> Can't you just disable sync rep first (using ALTER SYSTEM SET
> synchronous_standby_names=''), which will unstick the backend, and then
> terminate it?
If the failover happens due to unresponsive node we cannot just turn off sync
rep. We need to have some spare connections for that (number of stuck backends
will skyrocket during network partitioning). We need available descriptors and
some memory to fork new backend. We will need to re-read config. We need time
to try after all.
At some failures we may lack some of these.
Partial degradation is already hard task. Without ability to easily terminate
running Postgres HA tool will often resort to SIGKILL.
>
> If you don't handle the termination case, then there's still a chance
> for the transaction to become visible to other clients before its
> replicated.
Termination is admin command, they know what they are doing.
Cancelation is part of user protocol.
BTW can we have two GUCs? So that HA tool developers will decide on their own
which guaranties they provide?
>
>> There is one more caveat we need to fix: we should prevent instant
>> recovery from happening.
>
> That can already be done with the restart_after_crash GUC.
Oh, I didn't know it, we will use it. Thanks!
Best regards, Andrey Borodin.