On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote: > If the failover happens due to unresponsive node we cannot just turn > off sync rep. We need to have some spare connections for that (number > of stuck backends will skyrocket during network partitioning). We > need available descriptors and some memory to fork new backend. We > will need to re-read config. We need time to try after all. > At some failures we may lack some of these.
I think it's a good point that, when things start to go wrong, they can go very wrong very quickly. But until you've disabled sync rep, the primary will essentially be down for writes whether using this new feature or not. Even if you can terminate some backends to try to free space, the application will just make new connections that will get stuck the same way. You can avoid the "fork backend" problem by keeping a connection always open from the HA tool, or by editing the conf to disable sync rep and issuing SIGHUP instead. Granted, that still takes some memory. > Partial degradation is already hard task. Without ability to easily > terminate running Postgres HA tool will often resort to SIGKILL. When the system is really wedged as you describe (waiting on sync rep, tons of connections, and low memory), what information do you expect the HA tool to be able to collect, and what actions do you expect it to take? Presumably, you'd want it to disable sync rep at some point to get back online. Where does SIGTERM fit into the picture? > > If you don't handle the termination case, then there's still a > > chance > > for the transaction to become visible to other clients before its > > replicated. > > Termination is admin command, they know what they are doing. > Cancelation is part of user protocol. >From the pg_terminate_backend() docs: "This is also allowed if the calling role is a member of the role whose backend is being terminated or the calling role has been granted pg_signal_backend", so it's not really an admin command. Even for an admin, it might be hard to understand why terminating a backend could result in losing a visible transaction. I'm not really seeing two use cases here for two GUCs. Are you sure you want to disable only cancels but allow termination to proceed? Regards, Jeff Davis