There's no easy way to kick out a running broker from the cluster. If you block that broker's ability to connect to Zookeeper, after configured timeouts (6 seconds by default I think) you might effectively get that though. iptable rules on the ZK hosts, or the brokers, or whatever hook you have for that.
On Fri, Jun 8, 2018 at 10:52 AM, Enrique Medina Montenegro < e.medin...@gmail.com> wrote: > Hi Jacob, > > That could be a reason, but what about just a kernel failure or whatever > other reason? My question was not to determine the best environment to run, > but whether it would be possible to fail fast should this type of issues > pop up. > > Regards. > > > On June 8, 2018 7:43:11 PM Jacob Sheck <shec0...@gmail.com> wrote: > > What do you mean by "The issue appears when one of the brokers starts >> being impacted >> by environmental issues within the server it's running into (for whatever >> reason)"? >> >> You should consider Kafka to be a first tier service, it shouldn't be >> deployed on shared resources. There are a lot of opinions about >> containers, VMs, and bare metal, but regardless your kafka brokers should >> be isolated so they don't become resource starved. >> >> On Fri, Jun 8, 2018 at 7:52 AM Enrique Medina Montenegro < >> e.medin...@gmail.com> wrote: >> >> Hi, >>> >>> I was wondering if there is a proper way or best practices to fail fast a >>> broker when it's unresponsive (think about the server it's running on has >>> issues). Let me describe the scenario I'm currently facing. >>> >>> This is a 4 broker cluster using Kafka 1.1 with 5 ZK nodes, everything >>> running on containers (but could be as well applied to VMs or even bare >>> metal I believe). The issue appears when one of the brokers starts being >>> impacted by environmental issues within the server it's running into (for >>> whatever reason) , and it makes it almost unresponsive, but still "alive >>> enough" to stay in the cluster and be considered by the other brokers. >>> >>> So you cannot kill the broker (or the container) because the server it >>> runs >>> into basically times out all the commands, and you're only choice is to >>> restart or even stop the full server, but due to operational procedures , >>> that may take some time. >>> >>> >>> Therefore, is there any configuration that could be applied for such >>> broker >>> to be "kicked out" of the cluster even when the broker itself tries still >>> to be "alive"? >>> >>> The final consequence is that my cluster is literally down until I manage >>> to have the server restarted. >>> >>> Thanks for the support. >>> >> > > > -- Brett Rann Senior DevOps Engineer Zendesk International Ltd 395 Collins Street, Melbourne VIC 3000 Australia Mobile: +61 (0) 418 826 017