I've recently implemented further monitoring of our Kafka cluster to hone
in on where I think we have bottlenecks.
I'm interested in one metric in particular:
*kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}*

All the docs I've seen accompanying the metric state "non-zero for produce
requests when ack=-1".
What does it mean however in relation to consume requests (FetchConsumer),
or follower requests (FetchFollower)?

On my cluster - the TotalTimeMs is nice and low for produce requests, which
I would expect as we don't set a high acks value.
For follower and consume requests however, TotalTimeMs is nearly 500ms in
the 99th percentile, of which the RemoteTimeMS is the vast proportion.

My gut is telling me that followers are struggling to replicate from
leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is
telling me there is a high commit lag (waiting for all replicas in the ISR
to be updated)?

Many thanks in advance,

Marcus

Reply via email to