Rohan Kulkarni created KAFKA-9270:

             Summary: KafkaStream crash on offset commit failure
                 Key: KAFKA-9270
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.0.1
            Reporter: Rohan Kulkarni

On our Production server we intermittently observe Kafka Streams get crashed 
with TimeoutException while committing offset. The only workaround seems to be 
restarting the application which is not a desirable solution for a production 


While have already implemented ProductionExceptionHandler which does not seems 
to address this.


Please provide a fix for this or a viable workaround.


Application side logs:

2019-11-17 08:28:48.055 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - 
 - stream-thread 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] Failed to 
commit stream task 0_1 due to the following error:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before successfully committing offsets 
\{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}}


2019-11-17 08:29:00.891 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    
[:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
08:29:00.891 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    
[:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: 
HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions = 
[], controller = null) Active tasks: Running: Suspended: Restoring: New: 
Standby tasks: Running: Suspended: Restoring: New:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before successfully committing offsets 
\{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}}


Kafka broker logs:

[2019-11-17 13:53:22,774] WARN Client session timed out, have not heard from 
server in 6669ms for sessionid 0x10068e4a2944c2f 
[2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from 
server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)




This message was sent by Atlassian Jira

Reply via email to