What do you mean by "The issue appears when one of the brokers starts being impacted by environmental issues within the server it's running into (for whatever reason)"?
You should consider Kafka to be a first tier service, it shouldn't be deployed on shared resources. There are a lot of opinions about containers, VMs, and bare metal, but regardless your kafka brokers should be isolated so they don't become resource starved. On Fri, Jun 8, 2018 at 7:52 AM Enrique Medina Montenegro < e.medin...@gmail.com> wrote: > Hi, > > I was wondering if there is a proper way or best practices to fail fast a > broker when it's unresponsive (think about the server it's running on has > issues). Let me describe the scenario I'm currently facing. > > This is a 4 broker cluster using Kafka 1.1 with 5 ZK nodes, everything > running on containers (but could be as well applied to VMs or even bare > metal I believe). The issue appears when one of the brokers starts being > impacted by environmental issues within the server it's running into (for > whatever reason) , and it makes it almost unresponsive, but still "alive > enough" to stay in the cluster and be considered by the other brokers. > > So you cannot kill the broker (or the container) because the server it runs > into basically times out all the commands, and you're only choice is to > restart or even stop the full server, but due to operational procedures , > that may take some time. > > > Therefore, is there any configuration that could be applied for such broker > to be "kicked out" of the cluster even when the broker itself tries still > to be "alive"? > > The final consequence is that my cluster is literally down until I manage > to have the server restarted. > > Thanks for the support. >