Hi everyone, we run load tests against our web application (about 50K req/sec) and every time a kafka broker dies (also controlled shutdown), the producer tries to connect with the dead broker for about 10-15 minutes. For this time the application monitoring shows a constant error rate (about of 1/10 all kafka writes fail).
Our spec: * web-app in tomcat writes to kafka * 3 node kafka cluster * kafka 0.8.2 * new producer The producer config: acks=1 compression.type=snappy retries=0 batch_size=32768 buffer.memory=67108864 linger.ms=1500 metadata.fetch.timeout.ms=5000 timeout.ms= 1500 retry.backoff.ms=10000 reconnect.backoff.ms=10000 I can poll our Zookeeper and check if all brokers are alive, but I think KafkaProducer checks it already. Alexey