Hi Ismael, thanks a lot for your for your answer, it was indeed exactly the issue we had!
We did see the ticket for the issue before, but the steps to reproduce included a consumer group restart, which we didn't do, so that we thought we our problem is different. However, since patching with the fixing commit, everything works fine. Thanks again! Christiane On 7 August 2017 at 22:07, Ismael Juma <ism...@juma.me.uk> wrote: > Hi Christiane, > > Thanks for the email. That looks like > https://issues.apache.org/jira/browse/KAFKA-5600 > > Ismael > > On Mon, Aug 7, 2017 at 7:04 PM, Christiane Lemke < > christiane.le...@gmail.com > > wrote: > > > Hi all, > > > > we are fighting with offset rewinds of seemingly random size and hitting > > seemingly random partitions on restarting any node in our kafka cluster. > We > > are running out of ideas - any help or pointers to things to investigate > > are highly appreciated. > > > > Our kafka setup is dual data center with two local broker clusters (3 > nodes > > each) and two aggregate broker clusters (5 nodes each), the latter > running > > mirror maker to consume messages from the local cluster. > > > > Issues seem to have appeared since we upgraded from 0.10.1.0 to 0.11, but > > not entirely sure it’s related. > > > > We first had the theory of too big a consumer offset topic (we use > > compaction for it) causing the issues on restart, and indeed we found > that > > cleaner threads had died after the upgrade. But restarting and cleaning > > this topic did not help the issue. > > > > Logs are pretty silent when it happens, before we cleaned the consumer > > offset topic, we got a few of these every time it happened, but no longer > > now: > > > > [2017-08-04 11:19:25,970] ERROR [Group Metadata Manager on Broker 472]: > > Error loading offsets from __consumer_offsets-14 > > (kafka.coordinator.group.GroupMetadataManager) > > java.lang.IllegalStateException: Unexpected unload of active group > > tns-ticket-store-b144c9d1-425a-4b90-8310-f6e886741494 while loading > > partition __consumer_offsets-14 > > at > > kafka.coordinator.group.GroupMetadataManager$$anonfun$ > > loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:600) > > at > > kafka.coordinator.group.GroupMetadataManager$$anonfun$ > > loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:595) > > at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) > > at > > kafka.coordinator.group.GroupMetadataManager.loadGroupsAndOffsets( > > GroupMetadataManager.scala:595) > > at > > kafka.coordinator.group.GroupMetadataManager.kafka$coordinator$group$ > > GroupMetadataManager$$doLoadGroupsAndOffsets$1( > GroupMetadataManager.scala: > > 455) > > at > > kafka.coordinator.group.GroupMetadataManager$$anonfun$ > > loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:441) > > at > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp( > > KafkaScheduler.scala:110) > > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run( > > ScheduledThreadPoolExecutor.java:293) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1142) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:748) > > > > Does this seem familiar to anyone? Is there any suggestion as to what to > > look into closer to investigate this issue? Happy to give more details > > about anything that might be helpful. > > > > Thanks a lot in advance, > > > > Christiane > > >