Hey folks, We're observing a very peculiar behavior on our Kafka cluster. When one of the Kafka broker instances goes down, we're seeing the producer block (at .flush) for right about `request.timeout.ms` before returning success (or at least not throwing an exception) and moving on.
We're running Kafka on Kubernetes, so this may be related. Kafka is a Kubernetes PetSet with a global Service (like a load balancer) for consumers/producers to use for the bootstrap list. Our Kafka brokers are configured to come up with a predetermined set of broker ids (kafka-0, kafka-1 & kafka-2), but the IP likely changes every time it's restarted. Our Kafka settings are as follows: Producer: "acks" "all" "batch.size" "16384" "linger.ms" "1" "request.timeout.ms" "3000" "max.in.flight.requests.per.connection" "1" "retries" "2" "max.block.ms" "10000" "buffer.memory" "33554432" Broker: min.insync.replicas=1 I'm having a bit of a hard time debugging why this happens, mostly because I'm not seeing any logs from the producer. Is there a guide somewhere for turning up the logging information from the kafka java client? I'm using logback if that helps. Thanks, Mike. Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.