Hi Ming, I will be interested in the proposed capability to diagnose Kafka latency issues and continue the discussion. Do you mind if I take over this discussion thread and follow up with the community?
On 2021/04/25 17:33:10, Ming Liu <minga...@gmail.com> wrote: > The idea I am trying right now is: > 1. Add waitTimeMS in FetchResponse. > 2. If the fetch has to wait in purgatory due to either > replica.fetch.wait.max.ms or fetch.min.bytes, then it will fill the > waitTimeMS in FetchResponse. > 3. In updateRequestMetrics() function, we will special-process the Fetch > response, and remove the waitTimeMS out of RemoteTime and TotalTime. > Let me know for any suggestion/feedback. I like to propose a KIP on that > change. > > > On Sat, Apr 24, 2021 at 6:09 PM Israel Ekpo <israele...@gmail.com> wrote: > > > Hi Ming > > > > This would be a useful metric from a monitoring perspective especially > > when troubleshooting or diagnosing issues. > > > > Are you looking to modify the Admin API for this capability to be added? > > The metrics for quorum controllers, brokers, replicas and consumers may > > need to be reported differently > > > > I am interested in this capability as well. > > > > Maybe there is something in the current Admin API that is not obvious yet > > so I will need to investigate first and will get back to you with my > > thoughts/suggestions. > > > > Thanks for bringing this up > > > > Cheers > > > > > > > > On Sat, Apr 24, 2021 at 1:21 PM Ming Liu <minga...@gmail.com> wrote: > > > >> Hi All, > >> I am thinking about to start a KIP to report "REAL" broker/consumer > >> fetch latency. Before that, I like to collect any idea or suggestions. I > >> created https://issues.apache.org/jira/browse/KAFKA-12713. > >> The fetch latency is an important metric to monitor for the cluster > >> performance. With ACK=ALL, the produce latency is affected primarily by > >> broker fetch latency. However, currently the reported fetch latency > >> didn't > >> reflect the true fetch latency because it sometimes needs to stay in > >> purgatory and wait for replica.fetch.wait.max.ms when data is not > >> available. This greatly affects the real P50, P99 etc. > >> > >> I like to propose a KIP to be able track the real fetch latency for both > >> broker follower and consumer. > >> > >> Ming > >> > > >