Thanks, David. That is very useful information. Our data does arrive in 1
minute batches - I'll double check the history of the metric over time. If
that is the case, I would expect to see it fluctuate if I poll the metric
at a sub-minute interval

Many thanks,

Marcus

On Wed, 16 Jun 2021, 17:39 David Ballano Fernandez, <
dfernan...@demonware.net> wrote:

> Hi Marcus,
>
> For fetch requests, if the remote time is high, it could be that there is
> not enough data to give in a fetch response. This can happen when the
> consumer or replica is caught up and there is no new incoming data. If this
> is the case, remote time will be close to the max wait time, which is
> normal.
>
> I have seen this when my clusters are idle and I am not sending data to
> them
> hope this helps.
>
>
>
> On Wed, Jun 16, 2021 at 6:53 AM Marcus Horsley-Rai <marc...@gmail.com>
> wrote:
>
> > I've recently implemented further monitoring of our Kafka cluster to hone
> > in on where I think we have bottlenecks.
> > I'm interested in one metric in particular:
> >
> >
> *kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}*
> >
> > All the docs I've seen accompanying the metric state "non-zero for
> produce
> > requests when ack=-1".
> > What does it mean however in relation to consume requests
> (FetchConsumer),
> > or follower requests (FetchFollower)?
> >
> > On my cluster - the TotalTimeMs is nice and low for produce requests,
> which
> > I would expect as we don't set a high acks value.
> > For follower and consume requests however, TotalTimeMs is nearly 500ms in
> > the 99th percentile, of which the RemoteTimeMS is the vast proportion.
> >
> > My gut is telling me that followers are struggling to replicate from
> > leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is
> > telling me there is a high commit lag (waiting for all replicas in the
> ISR
> > to be updated)?
> >
> > Many thanks in advance,
> >
> > Marcus
> >
>

Reply via email to