Felix,
the thing is that I was using sync producer.

--
Viktor

2013/8/13 Felix GV <fe...@mate1inc.com>:
> Async production is meant to work this way. You have no delivery guarantee
> nor any exception because the producer sends the message independently of
> the code that called the aync production function.
>
> It is meant to be faster than sync production, but it is obviously intended
> for non-critical messages.
>
> --
> Felix
>
>
> On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy <
> viktor.kolodrevs...@gmail.com> wrote:
>
>> Hey guys,
>>
>> We decided to use Kafka in our new project, now I spend some time to
>> research how Kafka producer behaves while network connectivity
>> problems.
>>
>> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
>> network:
>>
>> 1. Kafka server(0.7.2) + Zookeper.
>> 2. Producer app with default settings.
>> 3. Consumer app.
>>
>> Results of the following tests with default sync producer settings:
>>
>> 1. Condition: Put network down on machine (1) for 20 mins.
>> Result: Producer is working for ~16mins. Consumer does not receive
>> anything.
>> After ~16mins Producer app fails(with java.io.IOException: Connection
>> timed out). Consumer app does not fail.
>> Messages that were generated during 16mins are lost!
>>
>> 2. Condition: Put network down on machine (1) for 5 mins and after 5
>> mins start network on (1) again.
>> Result: Producer app is working, no exceptions or notification that
>> network was down.
>> Consumer does not receive messages for 5 mins. But when network on (1)
>> is up it receives all messages.
>> There are no messages lost.
>>
>> 3. Condition: put network down on machine (2) for 20 mins.
>> Result: Producer is working for ~16mins. Consumer does not receive
>> anything.
>> After ~16mins Producer app fails(with java.io.IOException: Connection
>> timed out). Consumer app does not fail.
>> Messages that were generated during 16mins are lost! (Same result as in
>> test#1)
>> Kafka and Zookeeper logs that producer is disconnected.
>>
>> 4. Condition: Put network down on machine (2) for 5 mins and after 5
>> mins start network on (2) again.
>> Result: Producer app is working, no exceptions or notification that
>> network was down.
>> Consumer does not receive messages for 5 mins. But when network on (2)
>> is up it receives all messages.(Same result as in test#2)
>> Kafka and Zookeeper logs that producer is disconnected.
>>
>> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
>> not shutdown network).
>> Result: Producer fails in a few seconds with
>> "kafka.common.NoBrokersForPartitionException: Partition = null"
>> Consumer is still working even after 25 minutes.
>>
>> One more interesting thing. Changing connect.timeout.ms parameter
>> value for producer
>> did not change 16 mins that I have.
>>
>> Played with settings and find out the only way to reduce time for
>> producer to find out that network is down is to change one of two
>> parameters: reconnect.interval, reconnect.time.interval.ms
>>
>> So lets say we change reconnect.time.interval.ms=1000.
>> This means that producer will do reconnect to kafka every 1 second.
>> In this case producer find out that network is down in 1 second.
>> Producer stops sending messages and throw "java.net.ConnectException:
>> Connection timed out". This is the only way that I found out so far.
>> In this case we do not loose too much messages but performance may suffer.
>> Or we can set reconnect.interval=1 so reconnect will happen after each
>> message sent
>> and do not loose messages at all.
>>
>> Then I did testing for Async producer(producer.type=async)
>> The results are dramatic for me, coz producer does not throw any exception.
>> It sends messages and does not fall.
>> I left it running for night and it did not fall though network between
>> kafka server and producer app was down.
>> Playing with async producer config parameters did not help also.
>>
>> My questions are:
>>
>> 1. Where may these 16 mins come from?
>> 2. Are there any best practices to handle network down issues?
>> 3. Why async producer never throws exceptions when network is down?
>> 4. What is the way to check from sync/async producer that messages
>> were really sent?
>>



-- 
Thanks,
Viktor

Reply via email to