It will also be interesting to see stack trace from
"kafka-producer-network-thread" (which is the one that should be sending
the batches but maybe got stuck), and if this issue is reproducible for you
in a test environment - maybe generate logs in TRACE level.

On Thu, Aug 20, 2015 at 5:35 PM, Gwen Shapira <g...@confluent.io> wrote:

> Hi,
>
> I didn't see this issue during our network hiccups. You wrote you saw:
>
> Got error produce response with correlation id 17717 on topic-partition
> event.beacon-38, retrying (8 attempts left). Error: NETWORK_EXCEPTION
>
> What did you see after? Especially once the network issue was resolved?
> more retries? was there any successful sends?
> Producers blocking for a while is expected, but once the issue is resolved
> we expect the retries to success and unblock your producers. Is that what
> you saw?
>
> Gwen
>
>
> On Thu, Aug 20, 2015 at 4:56 PM, Drew Goya <d...@videoamp.com> wrote:
>
>> I've been running into an issue with the 0.8.2.1 new producer for a few
>> weeks now and I haven't been able to figure it out.  Hopefully someone on
>> the list can help!
>>
>> First off my producer config looks like this:
>>
>>     props.put(ProducerConfig.ACKS_CONFIG, "1")
>>     props.put(ProducerConfig.RETRIES_CONFIG, "10")
>>     props.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, "true")
>>     props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>> "org.apache.kafka.common.serialization.ByteArraySerializer")
>>     props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>> "org.apache.kafka.common.serialization.StringSerializer")
>>     props.put(ProducerConfig.TIMEOUT_CONFIG, "5000")
>>     props.put(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG, "5000")
>>
>> During network hiccups between my senders and the brokers  I start seeing
>> these log messages as expected:
>>
>> 2015-08-20 20:30:12,231 [kafka-producer-network-thread | producer-1] WARN
>>  org.apache.kafka.common.network.Selector - Error in I/O with
>> <host>/<ip-address>
>> java.io.IOException: Connection timed out
>>         at sun.nio.ch.FileDispatcherImpl.$$YJP$$read0(Native Method)
>>
>> followed by:
>>
>> Got error produce response with correlation id 17717 on topic-partition
>> event.beacon-38, retrying (8 attempts left). Error: NETWORK_EXCEPTION
>>
>> The problem is that even when network connectivity is restored the whole
>> app hangs.  Gathering a heap dump and looking through the
>> RecordAccumulator
>> I can see that the buffer is full and my producers are blocked
>> indefinitely.
>>
>> Any ideas?
>>
>
>

Reply via email to