Hi All, Please let me know if there is any other feedback or suggestions. Regards, Apoorv Mittal
On Fri, Nov 1, 2024 at 5:30 PM Apoorv Mittal <apoorvmitta...@gmail.com> wrote: > Thanks Jun and Andrew for reviewing and feedback. > > J1: I was thinking of having metrics as per the APIs in > SharePartitionManager but I think you are right. This is something not > beneficial to track as the release should only happen when a client sends a > leave group request. I have removed this metric. > > J2: Moved to Histogram. > > J3: Moved to Meter. I have removed RequestTopicPartitionsFetchEmptyCount > from KIP (as mentioned by Andrew - AS8) as RequestTopicPartitionsFetchRatio > moved to Histogram. Also I have marked InFlightMessageCount as Meter, so > shall capture the rate better. > > AS0: Done, specified specifically in the Motivation section. > > AS1: Hmmm, I thought about that earlier and was thinking about keeping the > abstracted share fetch APIs metrics in ShareGroupMetrics itself. However I > think the Share Fetch and Share Acknowledge can be part of BrokerTopicMetrics > as these operations happen by Share Consumer. Do you think we should also > move the RequestTopicPartitionsFetchRatio metric? I think we shouldn't and > it should be in ShareGroupMetrics itself, please let me know what you think. > > AS2: Yes you are right, it can support all topic rate as well. I have > added the suggestion. > > AS3: I have removed this metric as part of J1 comment. > > AS4,AS6: We already have a share-acknowledgement Meter metric defined in > KIP-932. I am of the opinion to move the KIP-932 share-acknowledgement > metric to TotalShareAcknowledgementRequestsPerSec. Hence I have added > corresponding FailedShareAcknowledgementRequestsPerSec in this KIP. > > AS5: It counts the failure of a share acknowledge request. Similar to > multi topic partition fetch, acknowledge can also be multi topic partition > but a single acknowledge request might either fail completely or at topic > partition level. Hence this metric tracks the failure as similar to > FailedFetchRequestsPerSec or FailedShareFetchRequestsPerSec. > > AS7: I have added some examples and another metric FetchLockTimeMs, which > should help in optimizing behaviour. > > AS8: Yes, we might be able to work with low percentiles to investigate > further hence I have removed RequestTopicPartitionsFetchEmptyCount. > However Histogram will give us percentiles but not the raw bucketed > value, in case we think we need such data explicitly then might consider > adding it but for now it's removed. > > > Regards, > Apoorv Mittal > > > On Thu, Oct 31, 2024 at 1:03 PM Andrew Schofield < > andrew_schofield_j...@outlook.com> wrote: > >> Hi Apoorv, >> Thanks for the KIP. I have some comments. >> >> AS0: These are broker metrics and not client metrics. I don't think the >> KIP states that and it probably should. >> >> AS1: TotalShareFetchRequestsPerSec is inspired by the existing metric >> TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics group >> so I would expect the new metric to be in the same group. This comment >> applies to all of the ShareGroupMetrics broker metrics scoped to a topic. >> >> AS2: Similarly to TotalFetchRequestPerSec, does the new metric support >> omitting the topic tag in order to obtain the all-topic rate? I believe so >> because it follows the existing metrics. I guess the tag is actually >> `topic=([-.\w]+)` to copy the existing notation. >> >> AS3: What's a share release? Does it count releases which are made >> by ShareFetch/Acknowledge requests as well as implicit ones as a >> result of share session expiration or closure? >> >> AS4: I see there is FailedShareAcknowledgementRequestsPerSec. >> If this corresponds specifically to the ShareAcknowledge API, then >> the name should be FailedShareAcknowledgeRequestsPerSec. >> >> AS5: What does FailedShareAcknowledgementRequestsPerSec >> actually count? This is a per-topic metric and a ShareAcknowledge >> can mention several topics, but actually it can also operate on many >> acquired records specifying any of the different acknowledgement >> types that KIP-932 defines. >> >> AS6: Why no TotalShareAcknowledgementRequestsPerSec? >> >> AS7: I think I like the concept behind RequestTopicPartitionsFetchRatio, >> but it would really help to include some examples of how it works. You >> mention 4 potential causes for low ratios, so it would be nice to see how >> each of them would be evident from the metrics and configurations. >> This is going to be valuable information for operating this in production. >> >> AS8: As Jun mentioned, maybe a histogram would be appropriate >> for RequestTopicPartitionsFetchRatio. Then I wonder whether it's >> worth having a separate RequestTopicPartitionsFetchEmptyCount since >> those are requests for which the fetch ratio was 0. >> >> Thanks, >> Andrew >> ________________________________________ >> From: Jun Rao <j...@confluent.io.INVALID> >> Sent: 30 October 2024 23:07 >> To: dev@kafka.apache.org <dev@kafka.apache.org> >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative >> consumption >> >> Hi, Apoorv, >> >> Thanks for the KIP. A few comments. >> >> 1. TotalShareReleaseRequestsPerSec: This is a bit weird since there is >> no ShareReleaseRequest. >> >> 2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: It >> seems >> Histogram is more appropriate for them. >> >> 3. RequestTopicPartitionsFetchEmptyCount and AcquisitionLockTimeoutCount: >> The problem with using gauge is that if the value changes quickly, we may >> not be able to capture the occurrence of empty fetch or lock timeout. >> Meter >> captures those rare events better. >> >> Jun >> >> >> On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal <apoorvmitta...@gmail.com> >> wrote: >> >> > Hi Everyone, >> > I would like to start a discussion on KIP-1103: >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption >> > >> > This KIP extends KIP-932 to provide additional metrics for >> > Queues/Cooperative consumption. >> > >> > Regards, >> > Apoorv Mittal >> > >> >