Hi, I was wondering if there is a proper way or best practices to fail fast a broker when it's unresponsive (think about the server it's running on has issues). Let me describe the scenario I'm currently facing.
This is a 4 broker cluster using Kafka 1.1 with 5 ZK nodes, everything running on containers (but could be as well applied to VMs or even bare metal I believe). The issue appears when one of the brokers starts being impacted by environmental issues within the server it's running into (for whatever reason) , and it makes it almost unresponsive, but still "alive enough" to stay in the cluster and be considered by the other brokers. So you cannot kill the broker (or the container) because the server it runs into basically times out all the commands, and you're only choice is to restart or even stop the full server, but due to operational procedures , that may take some time. Therefore, is there any configuration that could be applied for such broker to be "kicked out" of the cluster even when the broker itself tries still to be "alive"? The final consequence is that my cluster is literally down until I manage to have the server restarted. Thanks for the support.