Hi,

I was wondering if there is a proper way or best practices to fail fast a
broker when it's unresponsive (think about the server it's running on has
issues). Let me describe the scenario I'm currently facing.

This is a 4 broker cluster using Kafka 1.1 with 5 ZK nodes, everything
running on containers (but could be as well applied to VMs or even bare
metal I believe). The issue appears when one of the brokers starts being
impacted by environmental issues within the server it's running into (for
whatever reason) , and it makes it almost unresponsive, but still "alive
enough" to stay in the cluster and be considered by the other brokers.

So you cannot kill the broker (or the container) because the server it runs
into basically times out all the commands, and you're only choice is to
restart or even stop the full server, but due to operational procedures ,
that may take some time.


Therefore, is there any configuration that could be applied for such broker
to be "kicked out" of the cluster even when the broker itself tries still
to be "alive"?

The final consequence is that my cluster is literally down until I manage
to have the server restarted.

Thanks for the support.

Reply via email to