Felix, the thing is that I was using sync producer. -- Viktor
2013/8/13 Felix GV <fe...@mate1inc.com>: > Async production is meant to work this way. You have no delivery guarantee > nor any exception because the producer sends the message independently of > the code that called the aync production function. > > It is meant to be faster than sync production, but it is obviously intended > for non-critical messages. > > -- > Felix > > > On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy < > viktor.kolodrevs...@gmail.com> wrote: > >> Hey guys, >> >> We decided to use Kafka in our new project, now I spend some time to >> research how Kafka producer behaves while network connectivity >> problems. >> >> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one >> network: >> >> 1. Kafka server(0.7.2) + Zookeper. >> 2. Producer app with default settings. >> 3. Consumer app. >> >> Results of the following tests with default sync producer settings: >> >> 1. Condition: Put network down on machine (1) for 20 mins. >> Result: Producer is working for ~16mins. Consumer does not receive >> anything. >> After ~16mins Producer app fails(with java.io.IOException: Connection >> timed out). Consumer app does not fail. >> Messages that were generated during 16mins are lost! >> >> 2. Condition: Put network down on machine (1) for 5 mins and after 5 >> mins start network on (1) again. >> Result: Producer app is working, no exceptions or notification that >> network was down. >> Consumer does not receive messages for 5 mins. But when network on (1) >> is up it receives all messages. >> There are no messages lost. >> >> 3. Condition: put network down on machine (2) for 20 mins. >> Result: Producer is working for ~16mins. Consumer does not receive >> anything. >> After ~16mins Producer app fails(with java.io.IOException: Connection >> timed out). Consumer app does not fail. >> Messages that were generated during 16mins are lost! (Same result as in >> test#1) >> Kafka and Zookeeper logs that producer is disconnected. >> >> 4. Condition: Put network down on machine (2) for 5 mins and after 5 >> mins start network on (2) again. >> Result: Producer app is working, no exceptions or notification that >> network was down. >> Consumer does not receive messages for 5 mins. But when network on (2) >> is up it receives all messages.(Same result as in test#2) >> Kafka and Zookeeper logs that producer is disconnected. >> >> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do >> not shutdown network). >> Result: Producer fails in a few seconds with >> "kafka.common.NoBrokersForPartitionException: Partition = null" >> Consumer is still working even after 25 minutes. >> >> One more interesting thing. Changing connect.timeout.ms parameter >> value for producer >> did not change 16 mins that I have. >> >> Played with settings and find out the only way to reduce time for >> producer to find out that network is down is to change one of two >> parameters: reconnect.interval, reconnect.time.interval.ms >> >> So lets say we change reconnect.time.interval.ms=1000. >> This means that producer will do reconnect to kafka every 1 second. >> In this case producer find out that network is down in 1 second. >> Producer stops sending messages and throw "java.net.ConnectException: >> Connection timed out". This is the only way that I found out so far. >> In this case we do not loose too much messages but performance may suffer. >> Or we can set reconnect.interval=1 so reconnect will happen after each >> message sent >> and do not loose messages at all. >> >> Then I did testing for Async producer(producer.type=async) >> The results are dramatic for me, coz producer does not throw any exception. >> It sends messages and does not fall. >> I left it running for night and it did not fall though network between >> kafka server and producer app was down. >> Playing with async producer config parameters did not help also. >> >> My questions are: >> >> 1. Where may these 16 mins come from? >> 2. Are there any best practices to handle network down issues? >> 3. Why async producer never throws exceptions when network is down? >> 4. What is the way to check from sync/async producer that messages >> were really sent? >> -- Thanks, Viktor