Hi Shimi, 0.10.2.1 contains a number of fixes that should make the out of box experience better, including resiliency under broker failures and better exception handling. If you ever get back to it, and if the problem happens again, please do send us the logs and we'll happily have a look.
Thanks Eno > On 1 May 2017, at 12:05, Shimi Kiviti <shim...@gmail.com> wrote: > > Hi Eno, > I am afraid I played too much with the configuration to make this > productive investigation :( > > This is a QA environment which includes 2 kafka instances and 3 zookeeper > instances in AWS. There are only 3 partition for this topic. > Kafka broker and kafka-stream are version 0.10.1.1 > Our kafka-stream app run on docker using kubernetes. > I played around with with 1 to 3 kafka-stream processes, but I got the > same results. It is too easy to scale with kubernetes :) > Since there are only 3 partitions, I didn't start more then 3 instances. > > I was too quick to upgraded only the kafka-stream app to 0.10.2.1 with hope > that it will solve the problem, It didn't. > The log I sent before are from this version. > > I did notice "unknown" offset for the main topic with kafka-stream version > 0.10.2.1 > $ ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 > --describe --group sa > GROUP TOPIC PARTITION > CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > sa sa-events 0 842199 > 842199 0 > sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9 > sa sa-events 1 1078428 > 1078428 0 > sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9 > sa sa-events 2 unknown > 26093910 unknown > sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9 > > After that I downgraded the kafka-stream app back to version 0.10.1.1 > After a LONG startup time (more than an hour) where the status of the group > was rebalancing, all the 3 processes started processing messages again. > > This all thing started after we hit a bug in our code (NPE) that crashed > the stream processing thread. > So now after 4 days, everything is back to normal. > This worries me since it can happen again > > > On Mon, May 1, 2017 at 11:45 AM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Shimi, >> >> Could you provide more info on your setup? How many kafka streams >> processes do you have and from how many partitions are they consuming from. >> If you have more processes than partitions some of the processes will be >> idle and won’t do anything. >> >> Eno >>> On Apr 30, 2017, at 5:58 PM, Shimi Kiviti <shim...@gmail.com> wrote: >>> >>> Hi Everyone, >>> >>> I have a problem and I hope one of you can help me figuring it out. >>> One of our kafka-streams processes stopped processing messages >>> >>> When I turn on debug log I see lots of these messages: >>> >>> 2017-04-30 15:42:20,228 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher: >> Sending >>> fetch for partitions [devlast-changelog-2] to broker ip-x-x-x-x >>> .ec2.internal:9092 (id: 1 rack: null) >>> 2017-04-30 15:42:20,696 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher: >>> Ignoring fetched records for devlast-changelog-2 at offset 2962649 since >>> the current position is 2963379 >>> >>> After a LONG time, the only messages in the log are these: >>> >>> 2017-04-30 16:46:33,324 [kafka-coordinator-heartbeat-thread | sa] DEBUG >>> o.a.k.c.c.i.AbstractCoordinator: Sending Heartbeat request for group sa >> to >>> coordinator ip-x-x-x-x.ec2.internal:9092 (id: 2147483646 rack: null) >>> 2017-04-30 16:46:33,425 [kafka-coordinator-heartbeat-thread | sa] DEBUG >>> o.a.k.c.c.i.AbstractCoordinator: Received successful Heartbeat response >> for >>> group same >>> >>> Any idea? >>> >>> Thanks, >>> Shimi >> >>