Hi everyone,
We recently switched to Kafka 1.0 and are facing an issue which we have
not noticed with version 0.10.x before.
One of our consumer group falls into permanent rebalancing cycle. On
analysing the log files we noticed a StackOverflowError in
kafka-coordinator-heartbeat-thread (see partial stack trace below,
overall it's over 1,000 lines). Immediately before the error there are
hundreds, if not thousands of log entries of following type:
2017-12-12 16:23:12.361 [kafka-coordinator-heartbeat-thread |
my-consumer-group] INFO - [Consumer clientId=consumer-4,
groupId=my-consumer-group] Marking the coordinator <IP>:<Port> (id:
2147483645 rack: null) dead
The stack traces are always somewhere in the DateFormat code, even
though at different lines.
Is that a purely Kafka-internal thing (not to say a bug) or can we
somehow influence the occurrence of the error, e.g. is some
configuration potential affecting it? (Avoiding the connectivity issue
to the group coordinator is an obvious thing to do, but I rather have in
mind keeping the heartbeat thread alive despite the connectivity issue.)
Thanks in advance & regards,
Joerg
2017-12-12 16:23:05.884 [kafka-coordinator-heartbeat-thread |
my-consumer-group] ERROR - Uncaught exception in thread
'kafka-coordinator-heartbeat-thread | my-consumer-group':
java.lang.StackOverflowError
at
java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:362)
at
java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
at java.util.Calendar.getDisplayName(Calendar.java:2110)
at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
at java.text.DateFormat.format(DateFormat.java:345)
at
org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
at
org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
at
org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at
org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:324)
at
org.apache.kafka.common.utils.LogContext$KafkaLogger.info(LogContext.java:341)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:649)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
at
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
at
org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
at
org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
...
the following 9 lines are repeated around hundred times.
...
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.failUnsentRequests(ConsumerNetworkClient.java:416)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.disconnect(ConsumerNetworkClient.java:388)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:653)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
at
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
at
org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
at
org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)