[
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855098#comment-15855098
]
Jason Gustafson commented on KAFKA-4739:
----------------------------------------
[~sagar8192] Unfortunately, there is no such option. Traditionally, kafka
clients attempt to handle broker failures internally. This usually means a
metadata refresh and a reconnect, which is exactly what the client appears to
be doing here. We normally expect that the assigned partitions are spread
across multiple brokers, so a failure fetching from any particular broker
should only affect the availability of the partitions it was hosting. This is
typically what you want since a broker failure will cause another broker to
take over its partitions. There is little applications can do in these cases
anyway other than possibly sending an alert. Nevertheless, this behavior is
often contested and may change, especially as some of the automatic behavior
(such as topic auto-creation) is retired.
One small request: the logs seem to have sanitized broker ids. Can you ensure
that they have all been updated consistently? The puzzling thing is that the
the requests appear to be timing out on the client after 30s, yet you've
enabled 120s in the config. Are you sure the 120s is correct? In which config
did you enable "request_timeout_ms = 300001" (the broker doesn't have such a
config)? It's also strange that multiple fetches are cancelled after a
disconnect. The consumer should only ever have one fetch in-flight for each
broker. I don't have a ready explanation for that. Could there be some details
left out of the logs? We might get more information if you enable TRACE level
logging.
> KafkaConsumer poll going into an infinite loop
> ----------------------------------------------
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 0.9.0.1
> Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an
> infinite loop while polling, trying to fetch data from kafka. We are seeing
> the heartbeat requests on the broker from the consumer, but nothing else from
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs:
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is
> by restarting the process in multiple successions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)