hi Apoorv sorry for late response:
AS0: How can users distinguish between the group and topic in the format share-partition={group-topic-partition} since `-` is a valid character in both group and topic? AS1: Why do we only have metrics for failed AcknowledgementRequests? Aren't successful AcknowledgementRequests also important for metrics? Thanks, Chia-Ping On 2024/11/06 13:58:08 Apoorv Mittal wrote: > Hi All, > Please let me know if there is any other feedback. Else will start a voting > thread tomorrow. > > Regards, > Apoorv Mittal > > > On Tue, Nov 5, 2024 at 12:21 PM Apoorv Mittal <apoorvmitta...@gmail.com> > wrote: > > > Thanks for the feedback Andrew. > > > > AS9: Yes, you are correct it's topic name and not topic id. > > > > AS10: Thanks, I have updated the description. > > > > Regards, > > Apoorv Mittal > > > > > > On Tue, Nov 5, 2024 at 10:56 AM Andrew Schofield < > > andrew_schofield_j...@outlook.com> wrote: > > > >> Hi Apoorv, > >> A couple of additional comments. > >> > >> AS9: The tag share-partition : {group-topic-partition} confirm the > >> format. I suppose it's > >> one tag `group-id:topic-name:partition`, and not topic ID. Topic ID is > >> clearly more > >> authoritative, but I don't think it's usable in metrics. > >> > >> AS10: The description of the AcquisitionLockTimeoutMs would be better as > >> "Tracks the number of acquisition locks for records which are not > >> acknowledged within the timeout." > >> > >> Thanks, > >> Andrew > >> > >> ________________________________________ > >> From: Apoorv Mittal <apoorvmitta...@gmail.com> > >> Sent: 05 November 2024 08:59 > >> To: dev@kafka.apache.org <dev@kafka.apache.org> > >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative > >> consumption > >> > >> Hi All, > >> Please let me know if there is any other feedback or suggestions. > >> > >> Regards, > >> Apoorv Mittal > >> > >> > >> On Fri, Nov 1, 2024 at 5:30 PM Apoorv Mittal <apoorvmitta...@gmail.com> > >> wrote: > >> > >> > Thanks Jun and Andrew for reviewing and feedback. > >> > > >> > J1: I was thinking of having metrics as per the APIs in > >> > SharePartitionManager but I think you are right. This is something not > >> > beneficial to track as the release should only happen when a client > >> sends a > >> > leave group request. I have removed this metric. > >> > > >> > J2: Moved to Histogram. > >> > > >> > J3: Moved to Meter. I have removed RequestTopicPartitionsFetchEmptyCount > >> > from KIP (as mentioned by Andrew - AS8) as > >> RequestTopicPartitionsFetchRatio > >> > moved to Histogram. Also I have marked InFlightMessageCount as Meter, so > >> > shall capture the rate better. > >> > > >> > AS0: Done, specified specifically in the Motivation section. > >> > > >> > AS1: Hmmm, I thought about that earlier and was thinking about keeping > >> the > >> > abstracted share fetch APIs metrics in ShareGroupMetrics itself. > >> However I > >> > think the Share Fetch and Share Acknowledge can be part of > >> BrokerTopicMetrics > >> > as these operations happen by Share Consumer. Do you think we should > >> also > >> > move the RequestTopicPartitionsFetchRatio metric? I think we shouldn't > >> and > >> > it should be in ShareGroupMetrics itself, please let me know what you > >> think. > >> > > >> > AS2: Yes you are right, it can support all topic rate as well. I have > >> > added the suggestion. > >> > > >> > AS3: I have removed this metric as part of J1 comment. > >> > > >> > AS4,AS6: We already have a share-acknowledgement Meter metric defined in > >> > KIP-932. I am of the opinion to move the KIP-932 share-acknowledgement > >> > metric to TotalShareAcknowledgementRequestsPerSec. Hence I have added > >> > corresponding FailedShareAcknowledgementRequestsPerSec in this KIP. > >> > > >> > AS5: It counts the failure of a share acknowledge request. Similar to > >> > multi topic partition fetch, acknowledge can also be multi topic > >> partition > >> > but a single acknowledge request might either fail completely or at > >> topic > >> > partition level. Hence this metric tracks the failure as similar to > >> > FailedFetchRequestsPerSec or FailedShareFetchRequestsPerSec. > >> > > >> > AS7: I have added some examples and another metric FetchLockTimeMs, > >> which > >> > should help in optimizing behaviour. > >> > > >> > AS8: Yes, we might be able to work with low percentiles to investigate > >> > further hence I have removed RequestTopicPartitionsFetchEmptyCount. > >> > However Histogram will give us percentiles but not the raw bucketed > >> > value, in case we think we need such data explicitly then might consider > >> > adding it but for now it's removed. > >> > > >> > > >> > Regards, > >> > Apoorv Mittal > >> > > >> > > >> > On Thu, Oct 31, 2024 at 1:03 PM Andrew Schofield < > >> > andrew_schofield_j...@outlook.com> wrote: > >> > > >> >> Hi Apoorv, > >> >> Thanks for the KIP. I have some comments. > >> >> > >> >> AS0: These are broker metrics and not client metrics. I don't think the > >> >> KIP states that and it probably should. > >> >> > >> >> AS1: TotalShareFetchRequestsPerSec is inspired by the existing metric > >> >> TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics group > >> >> so I would expect the new metric to be in the same group. This comment > >> >> applies to all of the ShareGroupMetrics broker metrics scoped to a > >> topic. > >> >> > >> >> AS2: Similarly to TotalFetchRequestPerSec, does the new metric support > >> >> omitting the topic tag in order to obtain the all-topic rate? I > >> believe so > >> >> because it follows the existing metrics. I guess the tag is actually > >> >> `topic=([-.\w]+)` to copy the existing notation. > >> >> > >> >> AS3: What's a share release? Does it count releases which are made > >> >> by ShareFetch/Acknowledge requests as well as implicit ones as a > >> >> result of share session expiration or closure? > >> >> > >> >> AS4: I see there is FailedShareAcknowledgementRequestsPerSec. > >> >> If this corresponds specifically to the ShareAcknowledge API, then > >> >> the name should be FailedShareAcknowledgeRequestsPerSec. > >> >> > >> >> AS5: What does FailedShareAcknowledgementRequestsPerSec > >> >> actually count? This is a per-topic metric and a ShareAcknowledge > >> >> can mention several topics, but actually it can also operate on many > >> >> acquired records specifying any of the different acknowledgement > >> >> types that KIP-932 defines. > >> >> > >> >> AS6: Why no TotalShareAcknowledgementRequestsPerSec? > >> >> > >> >> AS7: I think I like the concept behind > >> RequestTopicPartitionsFetchRatio, > >> >> but it would really help to include some examples of how it works. You > >> >> mention 4 potential causes for low ratios, so it would be nice to see > >> how > >> >> each of them would be evident from the metrics and configurations. > >> >> This is going to be valuable information for operating this in > >> production. > >> >> > >> >> AS8: As Jun mentioned, maybe a histogram would be appropriate > >> >> for RequestTopicPartitionsFetchRatio. Then I wonder whether it's > >> >> worth having a separate RequestTopicPartitionsFetchEmptyCount since > >> >> those are requests for which the fetch ratio was 0. > >> >> > >> >> Thanks, > >> >> Andrew > >> >> ________________________________________ > >> >> From: Jun Rao <j...@confluent.io.INVALID> > >> >> Sent: 30 October 2024 23:07 > >> >> To: dev@kafka.apache.org <dev@kafka.apache.org> > >> >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative > >> >> consumption > >> >> > >> >> Hi, Apoorv, > >> >> > >> >> Thanks for the KIP. A few comments. > >> >> > >> >> 1. TotalShareReleaseRequestsPerSec: This is a bit weird since there is > >> >> no ShareReleaseRequest. > >> >> > >> >> 2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: It > >> >> seems > >> >> Histogram is more appropriate for them. > >> >> > >> >> 3. RequestTopicPartitionsFetchEmptyCount and > >> AcquisitionLockTimeoutCount: > >> >> The problem with using gauge is that if the value changes quickly, we > >> may > >> >> not be able to capture the occurrence of empty fetch or lock timeout. > >> >> Meter > >> >> captures those rare events better. > >> >> > >> >> Jun > >> >> > >> >> > >> >> On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal < > >> apoorvmitta...@gmail.com> > >> >> wrote: > >> >> > >> >> > Hi Everyone, > >> >> > I would like to start a discussion on KIP-1103: > >> >> > > >> >> > > >> >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption > >> >> > > >> >> > This KIP extends KIP-932 to provide additional metrics for > >> >> > Queues/Cooperative consumption. > >> >> > > >> >> > Regards, > >> >> > Apoorv Mittal > >> >> > > >> >> > >> > > >> > > >