[jira] [Updated] (KAFKA-9270) KafkaStream crash on offset commit failure

Rohan Kulkarni (Jira) Wed, 04 Dec 2019 22:00:44 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rohan Kulkarni updated KAFKA-9270:
----------------------------------
    Description: 
On our Production server we intermittently observe Kafka Streams get crashed 
with TimeoutException while committing offset. The only workaround seems to be 
restarting the application which is not a desirable solution for a production 
environment.

 

While have already implemented ProductionExceptionHandler which does not seems 
to address this.

 

Please provide a fix for this or a viable workaround.

 

+Application side logs:+

2019-11-17 08:28:48.055 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - 
org.apache.kafka.streams.processor.internals.AssignedStreamsTasks 
[org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373]
 - stream-thread 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] *Failed to 
commit stream task 0_1 due to the following error:*
 *org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before successfully committing offsets* 
\{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}}

 

2019-11-17 08:29:00.891 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    
[:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 
08:29:00.891 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    
[:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager 
MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: 
HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions = 
[], controller = null) Active tasks: Running: Suspended: Restoring: New: 
Standby tasks: Running: Suspended: Restoring: New:
 org.apache.kafka.common.errors.*TimeoutException: Timeout of 60000ms expired 
before successfully committing offsets* 
\{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}}

 

+Kafka broker logs:+

[2019-11-17 13:53:22,774] WARN *Client session timed out, have not heard from 
server in 6669ms for sessionid 0x10068e4a2944c2f* 
(org.apache.zookeeper.ClientCnxn)
 [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from 
server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)

 

Regards,

Rohan

  was:
On our Production server we intermittently observe Kafka Streams get crashed 
with TimeoutException while committing offset. The only workaround seems to be 
restarting the application which is not a desirable solution for a production 
environment.

 

While have already implemented ProductionExceptionHandler which does not seems 
to address this.

 

Please provide a fix for this or a viable workaround.

 

Application side logs:

2019-11-17 08:28:48.055 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - 
org.apache.kafka.streams.processor.internals.AssignedStreamsTasks 
[org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373]
 - stream-thread 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] Failed to 
commit stream task 0_1 due to the following error:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before successfully committing offsets 
\{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}}

 

2019-11-17 08:29:00.891 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    
[:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 
08:29:00.891 +0000 
[AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    
[:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager 
MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: 
HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions = 
[], controller = null) Active tasks: Running: Suspended: Restoring: New: 
Standby tasks: Running: Suspended: Restoring: New:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before successfully committing offsets 
\{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}}

 

Kafka broker logs:

[2019-11-17 13:53:22,774] WARN Client session timed out, have not heard from 
server in 6669ms for sessionid 0x10068e4a2944c2f 
(org.apache.zookeeper.ClientCnxn)
[2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from 
server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)

 

Regards,

Rohan


> KafkaStream crash on offset commit failure
> ------------------------------------------
>
>                 Key: KAFKA-9270
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9270
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.0.1
>            Reporter: Rohan Kulkarni
>            Priority: Critical
>
> On our Production server we intermittently observe Kafka Streams get crashed 
> with TimeoutException while committing offset. The only workaround seems to 
> be restarting the application which is not a desirable solution for a 
> production environment.
>  
> While have already implemented ProductionExceptionHandler which does not 
> seems to address this.
>  
> Please provide a fix for this or a viable workaround.
>  
> +Application side logs:+
> 2019-11-17 08:28:48.055 +0000 
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - 
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks 
> [org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373]
>  - stream-thread 
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] *Failed to 
> commit stream task 0_1 due to the following error:*
>  *org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
> before successfully committing offsets* 
> \{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}}
>  
> 2019-11-17 08:29:00.891 +0000 
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -  
>   [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
> AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 
> 08:29:00.891 +0000 
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -  
>   [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: 
> AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager 
> MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: 
> HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions 
> = [], controller = null) Active tasks: Running: Suspended: Restoring: New: 
> Standby tasks: Running: Suspended: Restoring: New:
>  org.apache.kafka.common.errors.*TimeoutException: Timeout of 60000ms expired 
> before successfully committing offsets* 
> \{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}}
>  
> +Kafka broker logs:+
> [2019-11-17 13:53:22,774] WARN *Client session timed out, have not heard from 
> server in 6669ms for sessionid 0x10068e4a2944c2f* 
> (org.apache.zookeeper.ClientCnxn)
>  [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from 
> server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection 
> and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  
> Regards,
> Rohan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-9270) KafkaStream crash on offset commit failure

Reply via email to