[ https://issues.apache.org/jira/browse/KAFKA-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880854#comment-17880854 ]
Colt McNealy commented on KAFKA-17455: -------------------------------------- Hi [~dajac] — here are the logs. I noticed that the app started stalling at 5:04:23, and exactly one minute later (5:05:23) we got this log: ``` 05:05:23 ERROR [KAFKA] TaskExecutor - stream-thread [basic-tls-0-core-StreamThread-2] Committing task(s) 1_3, 1_9 failed. ``` Debug logs are extremely chatty, and I'm not quite sure how best to parse them. But they are uploaded here in full. Again, the interesting parts are from 5:04:23-24 and 5:05:23-24. [^streams-app.log] > `TaskCorruptedException` After Client Quota Throttling > ------------------------------------------------------ > > Key: KAFKA-17455 > URL: https://issues.apache.org/jira/browse/KAFKA-17455 > Project: Kafka > Issue Type: Bug > Components: clients, streams > Affects Versions: 3.8.0 > Reporter: Colt McNealy > Priority: Major > > When running a Kafka Streams EOS app that goes slightly above a configured > user quota, we can reliably reproduce `TaskCorruptedException`s after > throttling. This is the case even with an application that goes only 5-10% > above the configured quota. > > The root cause is a `TimeoutException` encountered in the > `TaskExecutor.commitOffsetsOrTransaction`. > > Stacktrace provided below: > > ``` > 19:45:28 ERROR [KAFKA] TaskExecutor - stream-thread > [basic-tls-0-core-StreamThread-2] Committing task(s) 1_2 failed. > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 60000ms while awaiting AddOffsetsToTxn 19:45:28 WARN [KAFKA] StreamThread - > stream-thread [basic-tls-0-core-StreamThread-2] Detected the states of tasks > [1_2] are corrupted. Will close the task as dirty and re-create and bootstrap > from scratch. org.apache.kafka.streams.errors.TaskCorruptedException: Tasks > [1_2] are corrupted and hence need to be re-initialized at > org.apache.kafka.streams.processor.internals.TaskExecutor.commitOffsetsOrTransaction(TaskExecutor.java:249) > ~[server.jar:?] at > org.apache.kafka.streams.processor.internals.TaskExecutor.commitTasksAndMaybeUpdateCommittableOffsets(TaskExecutor.java:154) > ~[server.jar:?] at > org.apache.kafka.streams.processor.internals.TaskManager.commitTasksAndMaybeUpdateCommittableOffsets(TaskManager.java:1915) > ~[server.jar:?] at > org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1882) > ~[server.jar:?] at > org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1384) > ~[server.jar:?] at > org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:1033) > ~[server.jar:?] at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:711) > [server.jar:?] at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:670) > [server.jar:?] > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)