Hi Eno, I am afraid I played too much with the configuration to make this productive investigation :(
This is a QA environment which includes 2 kafka instances and 3 zookeeper instances in AWS. There are only 3 partition for this topic. Kafka broker and kafka-stream are version 0.10.1.1 Our kafka-stream app run on docker using kubernetes. I played around with with 1 to 3 kafka-stream processes, but I got the same results. It is too easy to scale with kubernetes :) Since there are only 3 partitions, I didn't start more then 3 instances. I was too quick to upgraded only the kafka-stream app to 0.10.2.1 with hope that it will solve the problem, It didn't. The log I sent before are from this version. I did notice "unknown" offset for the main topic with kafka-stream version 0.10.2.1 $ ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group sa GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER sa sa-events 0 842199 842199 0 sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9 sa sa-events 1 1078428 1078428 0 sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9 sa sa-events 2 unknown 26093910 unknown sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9 After that I downgraded the kafka-stream app back to version 0.10.1.1 After a LONG startup time (more than an hour) where the status of the group was rebalancing, all the 3 processes started processing messages again. This all thing started after we hit a bug in our code (NPE) that crashed the stream processing thread. So now after 4 days, everything is back to normal. This worries me since it can happen again On Mon, May 1, 2017 at 11:45 AM, Eno Thereska <eno.there...@gmail.com> wrote: > Hi Shimi, > > Could you provide more info on your setup? How many kafka streams > processes do you have and from how many partitions are they consuming from. > If you have more processes than partitions some of the processes will be > idle and won’t do anything. > > Eno > > On Apr 30, 2017, at 5:58 PM, Shimi Kiviti <shim...@gmail.com> wrote: > > > > Hi Everyone, > > > > I have a problem and I hope one of you can help me figuring it out. > > One of our kafka-streams processes stopped processing messages > > > > When I turn on debug log I see lots of these messages: > > > > 2017-04-30 15:42:20,228 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher: > Sending > > fetch for partitions [devlast-changelog-2] to broker ip-x-x-x-x > > .ec2.internal:9092 (id: 1 rack: null) > > 2017-04-30 15:42:20,696 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher: > > Ignoring fetched records for devlast-changelog-2 at offset 2962649 since > > the current position is 2963379 > > > > After a LONG time, the only messages in the log are these: > > > > 2017-04-30 16:46:33,324 [kafka-coordinator-heartbeat-thread | sa] DEBUG > > o.a.k.c.c.i.AbstractCoordinator: Sending Heartbeat request for group sa > to > > coordinator ip-x-x-x-x.ec2.internal:9092 (id: 2147483646 rack: null) > > 2017-04-30 16:46:33,425 [kafka-coordinator-heartbeat-thread | sa] DEBUG > > o.a.k.c.c.i.AbstractCoordinator: Received successful Heartbeat response > for > > group same > > > > Any idea? > > > > Thanks, > > Shimi > >