Any comments anyone? On Fri, Feb 15, 2019 at 6:08 PM Ankur Rana <ankur.r...@getfareye.com> wrote:
> Hi everyone, > > We have a Kafka cluster with 5 brokers with all topics having at least 2 > replication factor. We have multiple Kafka consumers applications running > on this cluster. Most of these consumers are build using consumer APIs and > quite recently we have started using Stream applications. > > We are facing a really weird issue. Just Sometimes it happens that our > Kafka cluster breaks down, By breaking down I mean that consumers and > producers start throwing disconnection exception and all of them just stop. > > We use debezium connector to push Postgres events to Kafka topics. > Debezium throws the error below: > [image: image.png] > > > Kafka broker throws the error below: > COORDINATOR_NOT_AVAILABLE > [image: image.png] > > > Error on the consumer side : > > [image: image.png] > > > In order to fix, I stop the disconnected broker and everything fixes > itself. Debezium starts flushing messages and all consumers start working > normally. I bring the disconnected broker up and everything works as > before without any problem. > > I don't understand a few things here : > > > 1. what could be the reason behind this disconnection exception. Even > if one of the broker was somehow disconnected, Isn't kafka suppose to > handle it in a cluster where all topics have a replication factor of 2. > 2. It appears that the malfunctioning broker was in a state where it > was neither disconnected nor connected to the cluster. I could still see > the broker visible in Kafka manager with zero bytes In, while it was > disconnected from all the producers and consumers. > 3. Weirdly, I have noticed that this situation usually occurs when I > start the multiple consumers of the stream application. Not sure about this > as this error has only occurred a few times. It happened twice today and > both the times I started 3 consumers of the same stream application. > > > Can anyone help me debug this problem. I don't know where to look for > possible issues with our cluster or stream application. I am attaching > streams config and stream application code for your reference. > Please feel free to ask for any more details. > > > Stream config : > [image: image.png] > > > Stream application code : https://codeshare.io/Gq6pLB > > -- > Thanks, > > Ankur Rana > Software Developer > FarEye > -- Thanks, Ankur Rana Software Developer FarEye