Hi Marcus,

For fetch requests, if the remote time is high, it could be that there is
not enough data to give in a fetch response. This can happen when the
consumer or replica is caught up and there is no new incoming data. If this
is the case, remote time will be close to the max wait time, which is
normal.

I have seen this when my clusters are idle and I am not sending data to them
hope this helps.



On Wed, Jun 16, 2021 at 6:53 AM Marcus Horsley-Rai <marc...@gmail.com>
wrote:

> I've recently implemented further monitoring of our Kafka cluster to hone
> in on where I think we have bottlenecks.
> I'm interested in one metric in particular:
>
> *kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}*
>
> All the docs I've seen accompanying the metric state "non-zero for produce
> requests when ack=-1".
> What does it mean however in relation to consume requests (FetchConsumer),
> or follower requests (FetchFollower)?
>
> On my cluster - the TotalTimeMs is nice and low for produce requests, which
> I would expect as we don't set a high acks value.
> For follower and consume requests however, TotalTimeMs is nearly 500ms in
> the 99th percentile, of which the RemoteTimeMS is the vast proportion.
>
> My gut is telling me that followers are struggling to replicate from
> leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is
> telling me there is a high commit lag (waiting for all replicas in the ISR
> to be updated)?
>
> Many thanks in advance,
>
> Marcus
>

Reply via email to