[ 
https://issues.apache.org/jira/browse/KAFKA-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464688#comment-16464688
 ] 

David Glasser commented on KAFKA-6843:
--------------------------------------

Well, that problem isn't fixed in Kafka 1.1. So for now, you just really don't 
want to have Zookeeper DNS names whose IPs change.  This could be documented, 
but I'm not sure where.

> Document issue with DNS TTL
> ---------------------------
>
>                 Key: KAFKA-6843
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6843
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: David Glasser
>            Priority: Major
>
> We run Kafka and Zookeeper in Google Kubernetes Engine. We have recently had 
> problems where our brokers had serious problems when GKE replaced our cluster 
> (cycling both Zookeeper and Kafka in parallel).  Kafka (1.0) brokers lost the 
> ability the talk to Zookeeper, and eventually failed their controlled 
> shutdown, leading to slow startup times for the new broker and outages for 
> our system.
> We eventually tracked this down to the fact that (at least in our 
> environment) the default JVM DNS caching behavior is to cache results 
> forever.  We rely on DNS to connect to Zookeeper, and the DNS resolution 
> changes when the Zookeeper pods are replaced.
> The fix is straightforward: setting the property networkaddress.cache.ttl or 
> sun.net.inetaddr.ttl to make the caching non-infinite (or use a "security 
> manager"). See 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html] 
> for details.
> I think this gotcha should be documented. Probably at 
> [https://kafka.apache.org/11/documentation/#java] ? I'm happy to submit a PR 
> if people agree this is the right place.  (I suppose somehow fixing this in 
> code would be nice too.)
> By the way, if you search the Apache issue tracker for 
> [networkaddress.cache.ttl|https://issues.apache.org/jira/browse/JAMES-774?jql=text%20~%20%22%5C%22networkaddress.cache.ttl%5C%22%22],
>  you'll learn that this is a common issue faced by many Apache Java projects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to