MG>can u write simpleConsumer to determine when lead broker times-out.. then you'll need to tweak connection settings https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
MG>to debug the response determine the leadBroker and the reason for fetch failure as seen here: if (fetchResponse.hasError()) { numErrors++; // Something went wrong! short code = fetchResponse.errorCode(a_topic, a_partition); System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code); ________________________________ From: Mike Kaplinskiy <m...@ladderlife.com> Sent: Thursday, October 27, 2016 3:11:14 AM To: users@kafka.apache.org Subject: Mysterious timeout Hey folks, We're observing a very peculiar behavior on our Kafka cluster. When one of the Kafka broker instances goes down, we're seeing the producer block (at .flush) for right about `request.timeout.ms` before returning success (or at least not throwing an exception) and moving on. We're running Kafka on Kubernetes, so this may be related. Kafka is a Kubernetes PetSet with a global Service (like a load balancer) for consumers/producers to use for the bootstrap list. Our Kafka brokers are configured to come up with a predetermined set of broker ids (kafka-0, kafka-1 & kafka-2), but the IP likely changes every time it's restarted. Our Kafka settings are as follows: Producer: "acks" "all" "batch.size" "16384" "linger.ms" "1" "request.timeout.ms" "3000" "max.in.flight.requests.per.connection" "1" "retries" "2" "max.block.ms" "10000" "buffer.memory" "33554432" Broker: min.insync.replicas=1 I'm having a bit of a hard time debugging why this happens, mostly because I'm not seeing any logs from the producer. Is there a guide somewhere for turning up the logging information from the kafka java client? I'm using logback if that helps. Thanks, Mike. Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.