[ 
https://issues.apache.org/jira/browse/KAFKA-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587902#comment-16587902
 ] 

Guozhang Wang commented on KAFKA-7214:
--------------------------------------

[~habdank] The issue reported in the original description and the issue 
reported above in your comment are not similar, but quite different: the former 
is some exception thrown from {{StreamTask.process}}, indicating sth. wrong 
while processing a specific record (it may be Streams library's issue, or maybe 
an ill-formatted record, or some edge cases in the user code), while the latter 
is some exception thrown from {{StreamTask.commitOffsets}}, which throws a 
{{CommitFailedException}} indicating that a rebalance has happened. I'll assume 
your request is for trouble shooting the second issue, not the first one.

Since your config {{max.poll.interval.ms}} is already very large, I think it is 
not the consumer caller thread that has a long pause, but maybe the underlying 
heartbeat has a GC and hence not being able to send the heartbeat request in 
time and get kicked out of the group as a result. As [~mjsax] mentioned, such 
CommitFailedException will be captured as a TaskMigrationException and will be 
handled gracefully (although it will log an ERROR, but it will not actually 
die).

> Mystic FATAL error
> ------------------
>
>                 Key: KAFKA-7214
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7214
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.11.0.3, 1.1.1
>            Reporter: Seweryn Habdank-Wojewodzki
>            Priority: Critical
>
> Dears,
> Very often at startup of the streaming application I got exception:
> {code}
> Exception caught in process. taskId=0_1, processor=KSTREAM-SOURCE-0000000000, 
> topic=my_instance_medium_topic, partition=1, offset=198900203; 
> [org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:212),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks$2.apply(AssignedTasks.java:347),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:420),
>  
> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:339),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:648),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:513),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:482),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:459)]
>  in thread 
> my_application-my_instance-my_instance_medium-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62
> {code}
> and then (without shutdown request from my side):
> {code}
> 2018-07-30 07:45:02 [ar313] [INFO ] StreamThread:912 - stream-thread 
> [my_application-my_instance-my_instance-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62]
>  State transition from PENDING_SHUTDOWN to DEAD.
> {code}
> What is this?
> How to correctly handle it?
> Thanks in advance for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to