Hi Dinesh Maybe you can check "kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=FetchFollower" on all broker to see if there are some broker are lower than others?
i think if some followers are busy on replicating, then that metric will be lower since maybe there are many records are waiting be replicated so that follower will not wait util to reach "replica.fetch.wait.max.ms". Not sure assumption is correct. what do you think? Best, Lisheng Dinesh Kumar <devdinu...@gmail.com> 于2019年8月22日周四 下午2:00写道: > Hi Lisheng, > > Yes, its RemoteTimeMs, > "kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=Produce" > > Sure, i'll try increasing the number of replica fetchers, other > configuration are as suggested by the paper, > > I was also wondering whether i can track which topic or which specific > follower is causing the issue (if it's network) since we' ve brokers across > different regions. > > Thanks, > Dinesh Kumar > > On Thu, Aug 22, 2019 at 11:14 AM Lisheng Wang <wanglishen...@gmail.com> > wrote: > > > Hi Dinesh > > > > Just wanna check if the metrics you called is "RemoteTimeMs" or not? > > > > if so, The meaning of "RemoteTimeMs" is the time the request is waiting > on > > a remote client for produce. A high value can imply a slow network > > connection. > > > > that explanation come from "Optimizing Your Apache KafkaTM Deployment" > > which can be download at > > > > > https://www.confluent.io/white-paper/optimizing-your-apache-kafka-deployment/ > > > > so i think you need focus on your network to see if it's a bottleneck. > > > > Hope that helps. > > > > Best, > > Lisheng > > > > > > Dinesh Kumar <devdinu...@gmail.com> 于2019年8月22日周四 下午1:18写道: > > > > > Hi, > > > > > > We've a kafka (version 2.0.0) cluster with multiple brokers, and many > > > producers with ack=all, or could be ack=1 (which we don't control), > > There's > > > increase in produce time from 10ms to ~150ms. > > > > > > With JMX metrics able to see "remote" is taking more time, which i > > figured > > > are followers. > > > > > > 1. Is there any configuration we could tweak to reduce the produce time > > > 2. What's the next step to say debug why remote produce time is high. > > > > > > > > > Thanks, > > > Dinesh Kumar > > > > > >