Hi Pawel, It seems the exception comes from a producer. When a stream task tries to resume after rebalancing, the producer of the task tries to initialize the transactions and runs into the timeout. This could happen if the broker is not reachable until the timeout is elapsed. Could the big lag that you described be caused by network issues?
You can increase the timeout by increasing max.block.ms in the producer configuration. Best, Bruno On Thu, Jul 4, 2019 at 2:43 PM Paweł Gontarz <pgont...@powerspace.com> wrote: > > Hey all, > > I have seen already in archive an email concerning this, but as a solution > it has been said to upgrade kafka version to 2.1. In my case, kafka is > already up to date. > > NOTE: Issue is on since this morning. > Specifying the problem, I'm running two kafka-streams stateful > applications. From the very beginning of the app lifecycle, instances > struggle to reassign correctly partitions between them which eventually > leads them to > > org.apache.kafka.streams.errors.StreamsException: stream-thread > > [pws-budget-streams-client-mapper-StreamThread-13] Failed to rebalance. > > > Due to > > Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired > > while initializing transactional state in 60000ms. > > > In the same time I'm observing a big lag on 2 partitions of the topic which > my streams are consuming. > The issue had started just this morning, whereas applications are for > already 1 month running without issues. > > One thing I did before it, was the reassignment of this two partitions to > different nodes. Why? To fight over CPU consumption on one of our brokers > (it wasn't balanced evenly). > > I have no clue if it has anything to do with problems on kafka-streams, > though. > > Anyone encountered similar problems? > > Cheers, > Paweł