Hi,

we are using Kafka in an environment where we have restricted access to the 
Kafka brokers.
The access needs to happen via TCP proxies. So, in my setup I have 3 Kafka 
brokers (broker1-3) for which I created 3 proxy instances and I have set these 
proxy instances as the bootstrap servers:
bootstrap.servers = [ssl://proxy1:9092, ssl://proxy2:9092, ssl://proxy3:9092]

Upon startup of a Kafka client I see that there is a successful metadata update 
and the client receive the original kafka broker names:
[2019-08-12 16:09:56,921] DEBUG Updated cluster metadata version 2 to 
Cluster(id = lFFAv0okS12Vg2rvVbsKdg, nodes = [broker1:9092 (id: 1 rack: r1), 
broker3:9092 (id: 3 rack: r3), broker2:9092 (id: 2 rack: r2)], partitions = 
[Partition(topic = mytopic, partition = 0, leader = 2, replicas = [2], isr = 
[2], offlineReplicas = [])], controller = broker1:9092 (id: 1 rack: r1)) 
(org.apache.kafka.clients.Metadata)

So, the initial connection via the proxy instances does work correctly. The 
Kafka documentation states that the bootstrap.servers are only used for 
bootstrapping so I assume that if the hostnames do not match the ones from the 
brokers they will be replaced.
Additionally I found this C++ library documentation 
(https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-broker-tcp-connections)
 which states the following:
The initial bootstrap broker connections will only be used for Metadata 
queries, unless the hostname and port of a bootstrap broker exactly matches the 
hostname and port of a broker returned in the Metadata response (this is the 
advertised.listeners broker configuration property), in which case the 
bootstrap broker connection will be associated with that broker's broker id and 
used for the full protocol set (such as producing or consuming).

We do not use this library but Spring Kafka Java library instead. However, the 
documentation makes sense with what I'm observing.

I'm asking if there is any way to disable this behavior and let Kafka use the 
hostnames from the bootstrap broker list for producing and consuming as well.
I do not see any other way so that we can use Kafka in this restricted scenario 
unless we introduce some DNS rerouting.

I would appreciate any insights on this topic if I'm missing information or if 
there is another way to achieve the successful connection under these 
circumstances.

Thanks 
Emanuel




Reply via email to