Madhukar, To me, the broker config looks good. The issue I see is that there is large number of synchronous producers, spamming the kafka brokers with a lot of singular appends. I think the suggested approach now is to use the new producer API: http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/Kafka Producer.html
Since the new producer only operates in asynchronous mode, it is fully thread safe so you would only need a single producer per instance. And it batches the messages together and sends them to kafka very efficiently. This would avoid nasty timeout exceptions, no hanging (unless the buffer is full), and a lot less network overhead for kafka due to larger batches. If there is a worry about a message getting lost in the ether, you can use the future returned by KafkaProducer.send(message) to do some sanity checks. Just try to avoid waiting in the connection thread if the requirements allow. Also, how many partitions is the topic setup to use, and what is the replication factor? If you only have one partition, or all the messages happen to get routed to the same partition (all having the same key), making better use of partitions to distributed the load across the kafka cluster can result in faster response times and more throughput. It also explains why broker 2 is consistently the problem: because its the only one receiving messages. As for the numbers you listed, I am not sure what fetch time refers to. Help from someone that knows more than I about request-logs would be required. -Erik On 9/14/15, 12:25 AM, "Madhukar Bharti" <bhartimadhu...@gmail.com> wrote: >Hi Erik & Prabhjot > >We are using Kafka-0.8.2.1 and old producer API with below config: > >request.required.acks=1 >request.timeout.ms=2000 >producer.type=sync > >On Kafka broker we are having: > >num.network.threads=8 >num.io.threads=10 >num.replica.fetchers=4 >replica.fetch.max.bytes=2097154 >replica.fetch.wait.max.ms=500 >replica.socket.timeout.ms=60000 >replica.socket.receive.buffer.bytes=65536 >replica.lag.time.max.ms=10000 >replica.high.watermark.checkpoint.interval.ms=5000 >replica.lag.max.messages=100 > >If you are asking about Singleton in terms of Producer then, we have >created pool of producers that has equal no of Producers and connection >that can be made in tomcat. > > >Thanks and Regards, >Madhukar > >On Fri, Sep 11, 2015 at 8:27 PM, Prabhjot Bharaj <prabhbha...@gmail.com> >wrote: > >> Hi, >> >> In addition to the parameters asked by Erik, it would be great if you >>could >> share your broker's server.properties as well >> >> Regards, >> Prabhjot >> >> On Fri, Sep 11, 2015 at 8:10 PM, Helleren, Erik < >> erik.helle...@cmegroup.com> >> wrote: >> >> > Hi Madhukar, >> > Some questions that can help understand whats going on: Which kafka >> > version is used? Which Producer API is being used >> > (http://kafka.apache.org/documentation.html#producerapi)? And what >>are >> > the configs for this producer? >> > >> > Also, because I know little about tomcat, is there a semantic for a >> > singleton, or a server singleton? >> > -Erik >> > >> > On 9/11/15, 8:48 AM, "Madhukar Bharti" <bhartimadhu...@gmail.com> >>wrote: >> > >> > >Hi, >> > > >> > > >> > >We are having 3 brokers in a cluster. Producer request is getting >>failed >> > >for broker 2. We are frequently getting below exception: >> > > >> > >15/09/09 22:09:06 WARN async.DefaultEventHandler: Failed to send >> > >producer request with* correlation id 1455 to broker 2* with data for >> > >partitions [UserEvents,0] >> > >> java.net.SocketTimeoutException >> > >> at >> > >>>>sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) >> > >> at >> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) >> > >> at >> > >> >>>>java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:3 >>>>85 >> > >>) >> > >> at kafka.utils.Utils$.read(Utils.scala:375) >> > >> at >> > >> >>>>kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceiv >>>>e. >> > >>scala:54) >> > >> at >> > kafka.network.Receive$class.readCompletely(Transmission.scala:56) >> > >> at >> > >> >>>>kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBuffer >>>>Re >> > >>ceive.scala:29) >> > >> at >> kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) >> > >> >> > >> >> > >After looking into request-logs in all machines, found that there is >> some >> > >slowness in broker 2. I am listing top 20 request processing time >>from >> all >> > >the brokers. >> > > >> > >Broker 1 >> > > >> > > Broker 2 >> > > >> > > Broker 3 >> > > >> > >Producer& >> > > >> > >Fetcher >> > > >> > >Producer >> > > >> > >Producer + Fetcher >> > > >> > >Producer >> > > >> > > Producer + Fetcher >> > > >> > >Producer >> > > >> > >493 >> > > >> > >494 >> > > >> > >495 >> > > >> > >496 >> > > >> > >497 >> > > >> > >498 >> > > >> > >499 >> > > >> > >500 >> > > >> > >501 >> > > >> > >502 >> > > >> > >503 >> > > >> > >504 >> > > >> > >519 >> > > >> > >520 >> > > >> > >541 >> > > >> > >542 >> > > >> > >545 >> > > >> > >551 >> > > >> > >577 >> > > >> > >633 >> > > >> > >77 >> > > >> > >91 >> > > >> > >94 >> > > >> > >96 >> > > >> > >104 >> > > >> > >111 >> > > >> > >112 >> > > >> > >153 >> > > >> > >167 >> > > >> > >184 >> > > >> > >248 >> > > >> > >249 >> > > >> > >254 >> > > >> > >284 >> > > >> > >395 >> > > >> > >443 >> > > >> > >470 >> > > >> > >551 >> > > >> > >577 >> > > >> > >633 >> > > >> > >1033 >> > > >> > >1034 >> > > >> > >1035 >> > > >> > >1036 >> > > >> > >1037 >> > > >> > >1038 >> > > >> > >1039 >> > > >> > >1040 >> > > >> > >1042 >> > > >> > >1043 >> > > >> > >1044 >> > > >> > >1049 >> > > >> > >1051 >> > > >> > >1057 >> > > >> > >1064 >> > > >> > >1087 >> > > >> > >1145 >> > > >> > >1146 >> > > >> > >1466 >> > > >> > >1467 >> > > >> > >85 >> > > >> > >86 >> > > >> > >114 >> > > >> > >121 >> > > >> > >123 >> > > >> > >136 >> > > >> > >153 >> > > >> > >201 >> > > >> > >225 >> > > >> > >226 >> > > >> > >240 >> > > >> > >299 >> > > >> > >405 >> > > >> > >406 >> > > >> > >448 >> > > >> > >449 >> > > >> > >455 >> > > >> > >464 >> > > >> > >505 >> > > >> > >658 >> > > >> > >489 >> > > >> > >490 >> > > >> > >491 >> > > >> > >492 >> > > >> > >493 >> > > >> > >494 >> > > >> > >495 >> > > >> > >496 >> > > >> > >497 >> > > >> > >498 >> > > >> > >499 >> > > >> > >500 >> > > >> > >501 >> > > >> > >502 >> > > >> > >503 >> > > >> > >506 >> > > >> > >510 >> > > >> > >514 >> > > >> > >515 >> > > >> > >516 >> > > >> > >19 >> > > >> > >20 >> > > >> > >21 >> > > >> > >22 >> > > >> > >23 >> > > >> > >24 >> > > >> > >27 >> > > >> > >28 >> > > >> > >31 >> > > >> > >32 >> > > >> > >60 >> > > >> > >89 >> > > >> > >98 >> > > >> > >104 >> > > >> > >110 >> > > >> > >114 >> > > >> > >259 >> > > >> > >288 >> > > >> > >337 >> > > >> > >385 >> > > >> > > >> > >What can be the reason that fetcher thread taking more time to >>process? >> > > >> > >What we need to do to get better performance? Any properties we need >>to >> > >tweak? >> > > >> > >Any suggestion are welcome. >> > > >> > > >> > >Note: We are pushing data to Kafka in user thread(tomcat) and set >> producer >> > >request timeout to 2sec. We don't want to increase timeout more than >>2 >> > >sec., as if it too many threads will get hangup then application >>will be >> > >hanged. >> > > >> > > >> > >Thanks and Regards, >> > >Madhukar >> > >> > >> >> >> -- >> --------------------------------------------------------- >> "There are only 10 types of people in the world: Those who understand >> binary, and those who don't" >>