For future reference: This bug does not appear anymore in 1.1.0

On Fri, Dec 15, 2017 at 3:25 PM, Rob Verkuylen <r...@verkuylen.net> wrote:

> Hi,
>
> After upgrading to 1.0 we're getting strange producer/broker behaviour not
> experienced on <1.0.
>
> As a test we run a single threaded producer just sending "TEST" against
> our cluster with the following producer settings, on a topic with
> replica's=3 and min.isr=2:
> linger.ms=10
> acks=all
> retries=1000
> batch=16k
> retry.backoff.ms=1000
>
> Using the callback on send we immediately see a huge lag in the amount of
> acks coming back(600k+), while on 0.11 this hovers around 4k-20k max). At
> the same time we see a drop in the producer sending msg/s, in about
> 90seconds this drops to 0. After 10minutes of silence all we see a list of
> network exceptions like these on all partitions: "Got error produce
> response with correlation id X on topic-partition test-topic, retrying (999
> attempts left). Error: NETWORK_EXCEPTION" Then short continuation on sends
> but quickly the same behaviour.
>
> Now for the kicker: Staring another thread after the first experiences
> this, producing on the same topic, same groupid, will 'release' the first
> thread and all acks are returned as normal and behaviour returns to normal.
> No issues are experienced when acks=1. Kafka logs show no issues at default
> log levels, havent had the opportunity to test further of with more fine
> grained log levels. The brokers run default settings with maybe the special
> that inter broker protocol is 1.0, but client protocol is still set to
> 0.9.0. Testing done above is with client ranging from 0.9 upto 1.0, all
> showing the same behaviour.
>
> Downgrading the entire cluster back to 0.11.0.2 same settings, same
> clients, same tests and all is well. Could this be a bug?
>
> Thanks,
>   Rob
>
>
>

Reply via email to