Hi Alexey, So, a couple things. Your config seems to have some issues that would result in long wait times,
You should try this configuration and see if you still have the issue: acks=1 compression.type=snappy retries=3 #Retry a few times to make it so they don¹t get dropped when a broker fails, at least not right away batch_size= 32768 buffer.memory=67108864 linger.ms=1500 metadata.fetch.timeout.ms=60000 # Default to give zookeeper a lot of time to return the metadata timeout.ms= 10000 #give kafka some time to respond before you consider it a failure. retry.backoff.ms=100 # Default. Keep this small so the producer fails quickly enough times to know a broker is down reconnect.backoff.ms=10 # Default. Same reason as above Hopefully the explanations of the changes make sense. At the very least, I would try changing retires up to 2 first. Also, what is your topic¹s configuration? -Erik On 8/28/15, 8:36 AM, "Alexey Sverdelov" <alexey.sverde...@googlemail.com> wrote: >Hi everyone, > >we run load tests against our web application (about 50K req/sec) and >every >time a kafka broker dies (also controlled shutdown), the producer tries to >connect with the dead broker for about 10-15 minutes. For this time the >application monitoring shows a constant error rate (about of 1/10 all >kafka >writes fail). > >Our spec: > >* web-app in tomcat writes to kafka >* 3 node kafka cluster >* kafka 0.8.2 >* new producer > >The producer config: > >acks=1 >compression.type=snappy >retries=0 >batch_size=32768 >buffer.memory=67108864 >linger.ms=1500 >metadata.fetch.timeout.ms=5000 >timeout.ms= 1500 >retry.backoff.ms=10000 >reconnect.backoff.ms=10000 > >I can poll our Zookeeper and check if all brokers are alive, but I think >KafkaProducer checks it already. > >Alexey