[ https://issues.apache.org/jira/browse/KAFKA-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohan Kulkarni updated KAFKA-9270: ---------------------------------- Description: On our Production server we intermittently observe Kafka Streams get crashed with TimeoutException while committing offset. The only workaround seems to be restarting the application which is not a desirable solution for a production environment. While have already implemented ProductionExceptionHandler which does not seems to address this. Please provide a fix for this or a viable workaround. +Application side logs:+ 2019-11-17 08:28:48.055 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - org.apache.kafka.streams.processor.internals.AssignedStreamsTasks [org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373] - stream-thread [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] *Failed to commit stream task 0_1 due to the following error:* *org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets* \{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}} 2019-11-17 08:29:00.891 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 08:29:00.891 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions = [], controller = null) Active tasks: Running: Suspended: Restoring: New: Standby tasks: Running: Suspended: Restoring: New: org.apache.kafka.common.errors.*TimeoutException: Timeout of 60000ms expired before successfully committing offsets* \{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}} +Kafka broker logs:+ [2019-11-17 13:53:22,774] WARN *Client session timed out, have not heard from server in 6669ms for sessionid 0x10068e4a2944c2f* (org.apache.zookeeper.ClientCnxn) [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) Regards, Rohan was: On our Production server we intermittently observe Kafka Streams get crashed with TimeoutException while committing offset. The only workaround seems to be restarting the application which is not a desirable solution for a production environment. While have already implemented ProductionExceptionHandler which does not seems to address this. Please provide a fix for this or a viable workaround. Application side logs: 2019-11-17 08:28:48.055 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - org.apache.kafka.streams.processor.internals.AssignedStreamsTasks [org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373] - stream-thread [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] Failed to commit stream task 0_1 due to the following error: org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets \{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}} 2019-11-17 08:29:00.891 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 08:29:00.891 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions = [], controller = null) Active tasks: Running: Suspended: Restoring: New: Standby tasks: Running: Suspended: Restoring: New: org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets \{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}} Kafka broker logs: [2019-11-17 13:53:22,774] WARN Client session timed out, have not heard from server in 6669ms for sessionid 0x10068e4a2944c2f (org.apache.zookeeper.ClientCnxn) [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) Regards, Rohan > KafkaStream crash on offset commit failure > ------------------------------------------ > > Key: KAFKA-9270 > URL: https://issues.apache.org/jira/browse/KAFKA-9270 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.0.1 > Reporter: Rohan Kulkarni > Priority: Critical > > On our Production server we intermittently observe Kafka Streams get crashed > with TimeoutException while committing offset. The only workaround seems to > be restarting the application which is not a desirable solution for a > production environment. > > While have already implemented ProductionExceptionHandler which does not > seems to address this. > > Please provide a fix for this or a viable workaround. > > +Application side logs:+ > 2019-11-17 08:28:48.055 +0000 > [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - > org.apache.kafka.streams.processor.internals.AssignedStreamsTasks > [org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373] > - stream-thread > [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] *Failed to > commit stream task 0_1 due to the following error:* > *org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired > before successfully committing offsets* > \{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}} > > 2019-11-17 08:29:00.891 +0000 > [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - > [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: > AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 > 08:29:00.891 +0000 > [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - > [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: > AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager > MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: > HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions > = [], controller = null) Active tasks: Running: Suspended: Restoring: New: > Standby tasks: Running: Suspended: Restoring: New: > org.apache.kafka.common.errors.*TimeoutException: Timeout of 60000ms expired > before successfully committing offsets* > \{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}} > > +Kafka broker logs:+ > [2019-11-17 13:53:22,774] WARN *Client session timed out, have not heard from > server in 6669ms for sessionid 0x10068e4a2944c2f* > (org.apache.zookeeper.ClientCnxn) > [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from > server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection > and attempting reconnect (org.apache.zookeeper.ClientCnxn) > > Regards, > Rohan -- This message was sent by Atlassian Jira (v8.3.4#803005)