The goal is to use sync producer and find out that network is down as
soon as possible.

--
Viktor

2013/8/13 Viktor Kolodrevskiy <viktor.kolodrevs...@gmail.com>:
> Felix,
> the thing is that I was using sync producer.
>
> --
> Viktor
>
> 2013/8/13 Felix GV <fe...@mate1inc.com>:
>> Async production is meant to work this way. You have no delivery guarantee
>> nor any exception because the producer sends the message independently of
>> the code that called the aync production function.
>>
>> It is meant to be faster than sync production, but it is obviously intended
>> for non-critical messages.
>>
>> --
>> Felix
>>
>>
>> On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy <
>> viktor.kolodrevs...@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> We decided to use Kafka in our new project, now I spend some time to
>>> research how Kafka producer behaves while network connectivity
>>> problems.
>>>
>>> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
>>> network:
>>>
>>> 1. Kafka server(0.7.2) + Zookeper.
>>> 2. Producer app with default settings.
>>> 3. Consumer app.
>>>
>>> Results of the following tests with default sync producer settings:
>>>
>>> 1. Condition: Put network down on machine (1) for 20 mins.
>>> Result: Producer is working for ~16mins. Consumer does not receive
>>> anything.
>>> After ~16mins Producer app fails(with java.io.IOException: Connection
>>> timed out). Consumer app does not fail.
>>> Messages that were generated during 16mins are lost!
>>>
>>> 2. Condition: Put network down on machine (1) for 5 mins and after 5
>>> mins start network on (1) again.
>>> Result: Producer app is working, no exceptions or notification that
>>> network was down.
>>> Consumer does not receive messages for 5 mins. But when network on (1)
>>> is up it receives all messages.
>>> There are no messages lost.
>>>
>>> 3. Condition: put network down on machine (2) for 20 mins.
>>> Result: Producer is working for ~16mins. Consumer does not receive
>>> anything.
>>> After ~16mins Producer app fails(with java.io.IOException: Connection
>>> timed out). Consumer app does not fail.
>>> Messages that were generated during 16mins are lost! (Same result as in
>>> test#1)
>>> Kafka and Zookeeper logs that producer is disconnected.
>>>
>>> 4. Condition: Put network down on machine (2) for 5 mins and after 5
>>> mins start network on (2) again.
>>> Result: Producer app is working, no exceptions or notification that
>>> network was down.
>>> Consumer does not receive messages for 5 mins. But when network on (2)
>>> is up it receives all messages.(Same result as in test#2)
>>> Kafka and Zookeeper logs that producer is disconnected.
>>>
>>> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
>>> not shutdown network).
>>> Result: Producer fails in a few seconds with
>>> "kafka.common.NoBrokersForPartitionException: Partition = null"
>>> Consumer is still working even after 25 minutes.
>>>
>>> One more interesting thing. Changing connect.timeout.ms parameter
>>> value for producer
>>> did not change 16 mins that I have.
>>>
>>> Played with settings and find out the only way to reduce time for
>>> producer to find out that network is down is to change one of two
>>> parameters: reconnect.interval, reconnect.time.interval.ms
>>>
>>> So lets say we change reconnect.time.interval.ms=1000.
>>> This means that producer will do reconnect to kafka every 1 second.
>>> In this case producer find out that network is down in 1 second.
>>> Producer stops sending messages and throw "java.net.ConnectException:
>>> Connection timed out". This is the only way that I found out so far.
>>> In this case we do not loose too much messages but performance may suffer.
>>> Or we can set reconnect.interval=1 so reconnect will happen after each
>>> message sent
>>> and do not loose messages at all.
>>>
>>> Then I did testing for Async producer(producer.type=async)
>>> The results are dramatic for me, coz producer does not throw any exception.
>>> It sends messages and does not fall.
>>> I left it running for night and it did not fall though network between
>>> kafka server and producer app was down.
>>> Playing with async producer config parameters did not help also.
>>>
>>> My questions are:
>>>
>>> 1. Where may these 16 mins come from?
>>> 2. Are there any best practices to handle network down issues?
>>> 3. Why async producer never throws exceptions when network is down?
>>> 4. What is the way to check from sync/async producer that messages
>>> were really sent?
>>>
>
>
>
> --
> Thanks,
> Viktor



-- 
Thanks,
Viktor

Reply via email to