The goal is to use sync producer and find out that network is down as soon as possible.
-- Viktor 2013/8/13 Viktor Kolodrevskiy <viktor.kolodrevs...@gmail.com>: > Felix, > the thing is that I was using sync producer. > > -- > Viktor > > 2013/8/13 Felix GV <fe...@mate1inc.com>: >> Async production is meant to work this way. You have no delivery guarantee >> nor any exception because the producer sends the message independently of >> the code that called the aync production function. >> >> It is meant to be faster than sync production, but it is obviously intended >> for non-critical messages. >> >> -- >> Felix >> >> >> On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy < >> viktor.kolodrevs...@gmail.com> wrote: >> >>> Hey guys, >>> >>> We decided to use Kafka in our new project, now I spend some time to >>> research how Kafka producer behaves while network connectivity >>> problems. >>> >>> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one >>> network: >>> >>> 1. Kafka server(0.7.2) + Zookeper. >>> 2. Producer app with default settings. >>> 3. Consumer app. >>> >>> Results of the following tests with default sync producer settings: >>> >>> 1. Condition: Put network down on machine (1) for 20 mins. >>> Result: Producer is working for ~16mins. Consumer does not receive >>> anything. >>> After ~16mins Producer app fails(with java.io.IOException: Connection >>> timed out). Consumer app does not fail. >>> Messages that were generated during 16mins are lost! >>> >>> 2. Condition: Put network down on machine (1) for 5 mins and after 5 >>> mins start network on (1) again. >>> Result: Producer app is working, no exceptions or notification that >>> network was down. >>> Consumer does not receive messages for 5 mins. But when network on (1) >>> is up it receives all messages. >>> There are no messages lost. >>> >>> 3. Condition: put network down on machine (2) for 20 mins. >>> Result: Producer is working for ~16mins. Consumer does not receive >>> anything. >>> After ~16mins Producer app fails(with java.io.IOException: Connection >>> timed out). Consumer app does not fail. >>> Messages that were generated during 16mins are lost! (Same result as in >>> test#1) >>> Kafka and Zookeeper logs that producer is disconnected. >>> >>> 4. Condition: Put network down on machine (2) for 5 mins and after 5 >>> mins start network on (2) again. >>> Result: Producer app is working, no exceptions or notification that >>> network was down. >>> Consumer does not receive messages for 5 mins. But when network on (2) >>> is up it receives all messages.(Same result as in test#2) >>> Kafka and Zookeeper logs that producer is disconnected. >>> >>> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do >>> not shutdown network). >>> Result: Producer fails in a few seconds with >>> "kafka.common.NoBrokersForPartitionException: Partition = null" >>> Consumer is still working even after 25 minutes. >>> >>> One more interesting thing. Changing connect.timeout.ms parameter >>> value for producer >>> did not change 16 mins that I have. >>> >>> Played with settings and find out the only way to reduce time for >>> producer to find out that network is down is to change one of two >>> parameters: reconnect.interval, reconnect.time.interval.ms >>> >>> So lets say we change reconnect.time.interval.ms=1000. >>> This means that producer will do reconnect to kafka every 1 second. >>> In this case producer find out that network is down in 1 second. >>> Producer stops sending messages and throw "java.net.ConnectException: >>> Connection timed out". This is the only way that I found out so far. >>> In this case we do not loose too much messages but performance may suffer. >>> Or we can set reconnect.interval=1 so reconnect will happen after each >>> message sent >>> and do not loose messages at all. >>> >>> Then I did testing for Async producer(producer.type=async) >>> The results are dramatic for me, coz producer does not throw any exception. >>> It sends messages and does not fall. >>> I left it running for night and it did not fall though network between >>> kafka server and producer app was down. >>> Playing with async producer config parameters did not help also. >>> >>> My questions are: >>> >>> 1. Where may these 16 mins come from? >>> 2. Are there any best practices to handle network down issues? >>> 3. Why async producer never throws exceptions when network is down? >>> 4. What is the way to check from sync/async producer that messages >>> were really sent? >>> > > > > -- > Thanks, > Viktor -- Thanks, Viktor