Hi, Clogging can happen if, as seems in your case, the requests are bounded by network. Just to confirm your configurations, does your broker configuration look like this?? :-
"num.replica.fetchers": 4, "replica.fetch.wait.max.ms": 500, "num.recovery.threads.per.data.dir": 4, "num.network.threads": 8, "socket.request.max.bytes": 104857600, "socket.receive.buffer.bytes": 10485760, "socket.send.buffer.bytes": 10485760, Similarly, please share your producer config as well. I'm thinking may be it is related to tuning your cluster. Thanks, Prabhjot On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck < andreas.fli...@digitalroute.com> wrote: > Great, thanks for the information! So it is definitely acks=all we want to > go for. Unfortunately we run into an blocking issue in our production like > test environment which we have not been able to find a solution for. So > here it is, ANY idea on how we could possibly find a solution is very much > appreciated! > > Environment: > Kafka version: kafka_2.11-0.8.2.1 > 5 kafka brokers and 5 ZK on spread out on 5 hosts > Using new producer (async) > > Topic: > partitions=10 > replication-factor=4 > min.insync.replicas=2 > > Default property values used for broker configs and producer. > > Scenario and problem: > Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which > is working great until we start another 5 producers sending to another 5 > topics with the same rate (10k). What happens then is that the producers > sending to 2 of the topics fills up the buffer and the throughput becomes > very low, with BufferExhaustedExceptions for most of the messages. When > checking the latency for the problematic topics it becomes really high > (around 150ms). Stopping the 5 producers that were started in the second > round, the latency goes down to about 1 ms again and the buffer will go > back to normal. The load is not that high, about 10MB/s, it is not even > near disk bound. > So the questions right now are, why do we get such high latency to > specifically two topics when starting more producers, even though cpu and > disk load looks unproblematic? And why two topics specifically, is there an > order of what topics to prfioritize when things get clogged for some reason? > > Sorry for the quite messy description, we are all kind of new at kafka > here! > > BR > Andreas > > > On 28 Nov 2015, at 09:26, Prabhjot Bharaj <prabhbha...@gmail.com> wrote: > > > > Hi, > > > > This should help :) > > > > During my benchmarks, I noticed that if 5 node kafka cluster running 1 > > topic is given a continuous injection of 50GB in one shot (using a > modified > > producer performance script, which writes my custom data to kafka), the > > last replica can sometimes lag and it used to catch up at a speed of 1GB > in > > 20-25 seconds. This lag increases if producer performance injects 200GB > in > > one shot. > > > > I'm not sure how it will behave with multiple topics. it could have an > > impact on the overall throughput (because more partitions will be alive > on > > the same broker thereby dividing the network usage), but I have to test > it > > in staging environment > > > > Regards, > > Prabhjot > > > > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <g...@confluent.io> > wrote: > > > >> Hi, > >> > >> min.insync.replica is alive and well in 0.9 :) > >> > >> Normally, you will have 4 our of 4 replicas in sync. However if one of > the > >> replicas will fall behind, you will have 3 out of 4 in sync. > >> If you set min.insync.replica = 3, produce requests will fail if the > number > >> on in-sync replicas fall below 3. > >> > >> I hope this helps. > >> > >> Gwen > >> > >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <prabhbha...@gmail.com > > > >> wrote: > >> > >>> Hi Gwen, > >>> > >>> How about min.isr.replicas property? > >>> Is it still valid in the new version 0.9 ? > >>> > >>> We could get 3 out of 4 replicas in sync if we set it's value to 3. > >>> Correct? > >>> > >>> Thanks, > >>> Prabhjot > >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <g...@confluent.io> wrote: > >>> > >>>> In your scenario, you are receiving acks from 3 replicas while it is > >>>> possible to have 4 in the ISR. This means that one replica can be up > to > >>>> 4000 messages (by default) behind others. If a leader crashes, there > is > >>> 33% > >>>> chance this replica will become the new leader, thereby losing up to > >> 4000 > >>>> messages. > >>>> > >>>> acks = all requires all ISR to ack as long as they are in the ISR, > >>>> protecting you from this scenario (but leading to high latency if a > >>> replica > >>>> is hanging and is just about to drop out of the ISR). > >>>> > >>>> Also, note that in future versions acks > 1 was deprecated, to protect > >>>> against such subtle mistakes. > >>>> > >>>> Gwen > >>>> > >>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck < > >>>> andreas.fli...@digitalroute.com> wrote: > >>>> > >>>>> Hi all > >>>>> > >>>>> The reason why I need to know is that we have seen an issue when > >> using > >>>>> acks=all, forcing us to quickly find an alternative. I leave the > >> issue > >>>> out > >>>>> of this post, but will probably come back to that! > >>>>> > >>>>> My question is about acks=all and min.insync.replicas property. Since > >>> we > >>>>> have found a workaround for an issue by using acks>1 instead of all > >>>>> (absolutely no clue why at this moment), I would like to know what > >>>> benefit > >>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of using > >>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my > >>>>> understanding you would get the exact level of durability and > >> security > >>>> from > >>>>> using either of those settings. However, I suspect this is not quite > >>> the > >>>>> case from finding hints without proper explanation that acks=all is > >>>>> preferred. > >>>>> > >>>>> > >>>>> Regards > >>>>> Andreas > >>>> > >>> > >> > > > > > > > > -- > > --------------------------------------------------------- > > "There are only 10 types of people in the world: Those who understand > > binary, and those who don't" > > -- --------------------------------------------------------- "There are only 10 types of people in the world: Those who understand binary, and those who don't"