Hi Christiane,

Thanks for the email. That looks like
https://issues.apache.org/jira/browse/KAFKA-5600

Ismael

On Mon, Aug 7, 2017 at 7:04 PM, Christiane Lemke <christiane.le...@gmail.com
> wrote:

> Hi all,
>
> we are fighting with offset rewinds of seemingly random size and hitting
> seemingly random partitions on restarting any node in our kafka cluster. We
> are running out of ideas - any help or pointers to things to investigate
> are highly appreciated.
>
> Our kafka setup is dual data center with two local broker clusters (3 nodes
> each) and two aggregate broker clusters (5 nodes each), the latter running
> mirror maker to consume messages from the local cluster.
>
> Issues seem to have appeared since we upgraded from 0.10.1.0 to 0.11, but
> not entirely sure it’s related.
>
> We first had the theory of too big a consumer offset topic (we use
> compaction for it) causing the issues on restart, and indeed we found that
> cleaner threads had died after the upgrade. But restarting and cleaning
> this topic did not help the issue.
>
> Logs are pretty silent when it happens, before we cleaned the consumer
> offset topic, we got a few of these every time it happened, but no longer
> now:
>
> [2017-08-04 11:19:25,970] ERROR [Group Metadata Manager on Broker 472]:
> Error loading offsets from __consumer_offsets-14
> (kafka.coordinator.group.GroupMetadataManager)
> java.lang.IllegalStateException: Unexpected unload of active group
> tns-ticket-store-b144c9d1-425a-4b90-8310-f6e886741494 while loading
> partition __consumer_offsets-14
>         at
> kafka.coordinator.group.GroupMetadataManager$$anonfun$
> loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:600)
>         at
> kafka.coordinator.group.GroupMetadataManager$$anonfun$
> loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:595)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at
> kafka.coordinator.group.GroupMetadataManager.loadGroupsAndOffsets(
> GroupMetadataManager.scala:595)
>         at
> kafka.coordinator.group.GroupMetadataManager.kafka$coordinator$group$
> GroupMetadataManager$$doLoadGroupsAndOffsets$1(GroupMetadataManager.scala:
> 455)
>         at
> kafka.coordinator.group.GroupMetadataManager$$anonfun$
> loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:441)
>         at
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> KafkaScheduler.scala:110)
>         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
>
> Does this seem familiar to anyone? Is there any suggestion as to what to
> look into closer to investigate this issue? Happy to give more details
> about anything that might be helpful.
>
> Thanks a lot in advance,
>
> Christiane
>

Reply via email to