Thanks, David. That is very useful information. Our data does arrive in 1 minute batches - I'll double check the history of the metric over time. If that is the case, I would expect to see it fluctuate if I poll the metric at a sub-minute interval
Many thanks, Marcus On Wed, 16 Jun 2021, 17:39 David Ballano Fernandez, < dfernan...@demonware.net> wrote: > Hi Marcus, > > For fetch requests, if the remote time is high, it could be that there is > not enough data to give in a fetch response. This can happen when the > consumer or replica is caught up and there is no new incoming data. If this > is the case, remote time will be close to the max wait time, which is > normal. > > I have seen this when my clusters are idle and I am not sending data to > them > hope this helps. > > > > On Wed, Jun 16, 2021 at 6:53 AM Marcus Horsley-Rai <marc...@gmail.com> > wrote: > > > I've recently implemented further monitoring of our Kafka cluster to hone > > in on where I think we have bottlenecks. > > I'm interested in one metric in particular: > > > > > *kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}* > > > > All the docs I've seen accompanying the metric state "non-zero for > produce > > requests when ack=-1". > > What does it mean however in relation to consume requests > (FetchConsumer), > > or follower requests (FetchFollower)? > > > > On my cluster - the TotalTimeMs is nice and low for produce requests, > which > > I would expect as we don't set a high acks value. > > For follower and consume requests however, TotalTimeMs is nearly 500ms in > > the 99th percentile, of which the RemoteTimeMS is the vast proportion. > > > > My gut is telling me that followers are struggling to replicate from > > leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is > > telling me there is a high commit lag (waiting for all replicas in the > ISR > > to be updated)? > > > > Many thanks in advance, > > > > Marcus > > >