Hi, I’m seeing a puzzling behavior with Flink’s Kafka Sink and I’d like to understand if this is expected Kafka client behavior.
Setup * Kafka cluster with 3 brokers * I provide 2 broker IPs of the same Kafka cluster in bootstrap.servers, for example: * bootstrap.servers=10.1.1.10:9092,10.1.1.11:9092 * Flink Kafka Sink parallelism = 2 * Both Flink subtasks run in the same Kubernetes cluster (same TaskManager) Observed Behavior * Flink Sink Subtask #1 always connects to the first bootstrap IP (10.1.1.10). * Flink Sink Subtask #2 always connects to the second bootstrap IP (10.1.1.11). * This happens every time, consistently across job restarts. My expectation was that each KafkaProducer would try bootstrap servers in the order provided, i.e., try 10.1.1.10 → only fall back to 10.1.1.11 if the first one is unreachable. However, both brokers are healthy and reachable from my TaskManager pod, yet the second subtask never uses the first IP and always jumps to the second IP. Questions 1. Why would two KafkaProducer instances behave differently when connecting to the same bootstrap list? 2. Does the Kafka client have any deterministic behavior or ordering differences between separate JVM instances/subtasks that would cause this? 3. Is this expected, or should both producers normally connect to the first bootstrap server unless it’s actually unavailable? Regards, Prateek Kohli
