[ 
https://issues.apache.org/jira/browse/KAFKA-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157063#comment-17157063
 ] 

Michael Jaschob commented on KAFKA-9350:
----------------------------------------

Chiming in here to say I've found the same error in a 2.3.1 broker. Not sure 
what the OP's behavior was but we are seeing unbounded growth on one of the 
__consumer_offsets partitions and the timing of this error correlates to the 
start of that growth. I've just started looking at this myself so don't want to 
jump to any conclusions. But [~hachikuji] wondering if off the top of your head 
you can imagine scenarios where this error would lead to the partition not 
being properly compacted?

> IllegalStateException when materializing transactional offset commits
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-9350
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9350
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> We have caught this exception a few times in the log:
> {code}
> java.lang.IllegalStateException: Trying to complete a transactional offset 
> commit for producerId 16031 and groupId foo even though the offset commit 
> record itself hasn't been appended to the log.
>       at 
> kafka.coordinator.group.GroupMetadata.$anonfun$completePendingTxnOffsetCommit$2(GroupMetadata.scala:595)
>       at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>       at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>       at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>       at 
> kafka.coordinator.group.GroupMetadata.$anonfun$completePendingTxnOffsetCommit$1(GroupMetadata.scala:592)
>       at 
> kafka.coordinator.group.GroupMetadata.$anonfun$completePendingTxnOffsetCommit$1$adapted(GroupMetadata.scala:591)
>       at scala.Option.foreach(Option.scala:274)
>       at 
> kafka.coordinator.group.GroupMetadata.completePendingTxnOffsetCommit(GroupMetadata.scala:591)
>       at 
> kafka.coordinator.group.GroupMetadataManager.$anonfun$handleTxnCompletion$2(GroupMetadataManager.scala:828)
>       at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
>       at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
>       at 
> kafka.coordinator.group.GroupMetadataManager.$anonfun$handleTxnCompletion$1(GroupMetadataManager.scala:827)
>       at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>       at 
> kafka.coordinator.group.GroupMetadataManager.handleTxnCompletion(GroupMetadataManager.scala:824)
>       at 
> kafka.coordinator.group.GroupMetadataManager.$anonfun$scheduleHandleTxnCompletion$1(GroupMetadataManager.scala:819)
>       at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>       at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> It seems that the end transaction marker callback is getting triggered before 
> the offset commit callback. This is puzzling because transaction completion 
> should be tied to a successful TxnOffsetCommit response which depends on 
> completion of the offset commit callback. So it's possible either that there 
> is some case we're missing in the broker or there is some bug in the client. 
> I looked through the logic on both sides and there is no obvious problem.
> In any case, it probably makes sense to let the broker behave more 
> defensively since there is no guarantee that a client won't send EndTxn 
> before receiving a successful TxnOffsetCommit response.
> Note the impact of this bug would tend to not be noticed because usually 
> there is a subsequent offset commit which succeeds. However, in the worst 
> case, it can violate EOS guarantees because it could cause the consumer to 
> revert to a previously committed offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to