[ 
https://issues.apache.org/jira/browse/KAFKA-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298761#comment-16298761
 ] 

Jason Gustafson commented on KAFKA-6366:
----------------------------------------

[~joerg.heinicke] Thanks for sharing the logs. One thing that immediately 
stands out is the large number of async offset commit failures. I counted 
13,359 instances. Considering the "Marking coordinator dead" messages, there 
are about 10,862 instances. This is just a guess, but do you have any retry 
logic implemented for when async offset commits fail? That would explain the 
large number of "Marking coordinator dead" messages as well as the stack 
overflow.

> StackOverflowError in kafka-coordinator-heartbeat-thread
> --------------------------------------------------------
>
>                 Key: KAFKA-6366
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6366
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 1.0.0
>            Reporter: Joerg Heinicke
>         Attachments: 6366.v1.txt, ConverterProcessor.zip
>
>
> With Kafka 1.0 our consumer groups fall into a permanent cycle of rebalancing 
> once a StackOverflowError in the heartbeat thread occurred due to 
> connectivity issues of the consumers to the coordinating broker:
> Immediately before the exception there are hundreds, if not thousands of log 
> entries of following type:
> 2017-12-12 16:23:12.361 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] INFO  - [Consumer clientId=consumer-4, 
> groupId=my-consumer-group] Marking the coordinator <IP>:<Port> (id: 
> 2147483645 rack: null) dead
> The exceptions always happen somewhere in the DateFormat code, even 
> though at different lines.
> 2017-12-12 16:23:12.363 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] ERROR - Uncaught exception in thread 
> 'kafka-coordinator-heartbeat-thread | my-consumer-group':
> java.lang.StackOverflowError
>          at 
> java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:362)
>          at 
> java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
>          at java.util.Calendar.getDisplayName(Calendar.java:2110)
>          at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
>          at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
>          at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
>          at java.text.DateFormat.format(DateFormat.java:345)
>          at 
> org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
>          at 
> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>          at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
>          at 
> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
>          at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>          at 
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>          at 
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>          at org.apache.log4j.Category.callAppenders(Category.java:206)
>          at org.apache.log4j.Category.forcedLog(Category.java:391)
>          at org.apache.log4j.Category.log(Category.java:856)
>          at 
> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:324)
>          at 
> org.apache.kafka.common.utils.LogContext$KafkaLogger.info(LogContext.java:341)
>          at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:649)
>          at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
>          at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>          at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>          at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>          at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
> ...
> the following 9 lines are repeated around hundred times.
> ...
>          at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
>          at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
>          at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.failUnsentRequests(ConsumerNetworkClient.java:416)
>          at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.disconnect(ConsumerNetworkClient.java:388)
>          at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:653)
>          at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
>          at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>          at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>          at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>          at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to