[ https://issues.apache.org/jira/browse/KAFKA-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114260#comment-15114260 ]
Ewen Cheslack-Postava commented on KAFKA-2426: ---------------------------------------------- Although it should be *very* unusual to have this problem, it does seem like it would be nice if the one case where a node connects to itself could use the local interface/address to connect. In environments like AWS, this is nice since it avoids going through any NATs or external routing since it'd use the local IP rather than a public IP that requires additional routing and proxying. That said, I'm not sure it's worth special casing this -- one reason it's probably not easy is that the local/advertised hostname info probably isn't tracked far enough to switch between the two. > A Kafka node tries to connect to itself through its advertised hostname > ----------------------------------------------------------------------- > > Key: KAFKA-2426 > URL: https://issues.apache.org/jira/browse/KAFKA-2426 > Project: Kafka > Issue Type: Bug > Components: network > Affects Versions: 0.8.2.1 > Environment: Docker https://github.com/wurstmeister/kafka-docker, > managed by a Kubernetes cluster, with an "iptables proxy". > Reporter: Mikaƫl Cluseau > Assignee: Jun Rao > > Hi, > when used behind a firewall, Apache Kafka nodes are trying to connect to > themselves using their advertised hostnames. This means that if you have a > service IP managed by the docker's host using *only* iptables DNAT rules, the > node's connection to "itself" times out. > This is the case in any setup where a host will DNAT the service IP to the > instance's IP, and send the packet back on the same interface other a Linux > Bridge port not configured in "hairpin" mode. It's because of this: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_forward.c#n30 > The specific part of the kubernetes issue is here: > https://github.com/BenTheElder/kubernetes/issues/3#issuecomment-123925060 . > The timeout involves that the even if partition's leader is elected, it then > fails to accept writes from the other members, causing a write lock. and > generating very heavy logs (as fast as Kafka usualy is, but through log4j > this time ;)). > This also means that the normal docker case work by going through the > userspace-proxy, which necessarily impacts the performance. > The workaround for us was to add a "127.0.0.2 advertised-hostname" to > /etc/hosts in the container startup script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)