Hi Jun and Chia, Thanks for the feedback. 10. Done.
11. I have moved FetchLockTimeMs to Histogram. I didn't understand which ratio shall FetchLockRatio Metric should hold i.e. against what value we should compare the partitions locked. Though I have added Metric RequestTopicPartitionsFetchRatio which provides a ratio of partition locked vs requested. Shall we have something else as well? 12. Yeah, true. Done. AS0: Make sense, I have moved to group, topic, partition tags as like Kafka.cluster Partition metrics. AS1: Successful AcknowledgementRequests are also important and they are already defined in KIP-932 broker metrics. We shall change the metric name for successful AcknowledgementRequests in KIP-932 itself while implementing these metrics. Regards, Apoorv Mittal On Thu, Nov 7, 2024 at 7:36 PM Chia-Ping Tsai <chia7...@apache.org> wrote: > hi Apoorv > > sorry for late response: > > AS0: How can users distinguish between the group and topic in the format > share-partition={group-topic-partition} since `-` is a valid character in > both group and topic? > > AS1: Why do we only have metrics for failed AcknowledgementRequests? > Aren't successful AcknowledgementRequests also important for metrics? > > Thanks, > Chia-Ping > > On 2024/11/06 13:58:08 Apoorv Mittal wrote: > > Hi All, > > Please let me know if there is any other feedback. Else will start a > voting > > thread tomorrow. > > > > Regards, > > Apoorv Mittal > > > > > > On Tue, Nov 5, 2024 at 12:21 PM Apoorv Mittal <apoorvmitta...@gmail.com> > > wrote: > > > > > Thanks for the feedback Andrew. > > > > > > AS9: Yes, you are correct it's topic name and not topic id. > > > > > > AS10: Thanks, I have updated the description. > > > > > > Regards, > > > Apoorv Mittal > > > > > > > > > On Tue, Nov 5, 2024 at 10:56 AM Andrew Schofield < > > > andrew_schofield_j...@outlook.com> wrote: > > > > > >> Hi Apoorv, > > >> A couple of additional comments. > > >> > > >> AS9: The tag share-partition : {group-topic-partition} confirm the > > >> format. I suppose it's > > >> one tag `group-id:topic-name:partition`, and not topic ID. Topic ID is > > >> clearly more > > >> authoritative, but I don't think it's usable in metrics. > > >> > > >> AS10: The description of the AcquisitionLockTimeoutMs would be better > as > > >> "Tracks the number of acquisition locks for records which are not > > >> acknowledged within the timeout." > > >> > > >> Thanks, > > >> Andrew > > >> > > >> ________________________________________ > > >> From: Apoorv Mittal <apoorvmitta...@gmail.com> > > >> Sent: 05 November 2024 08:59 > > >> To: dev@kafka.apache.org <dev@kafka.apache.org> > > >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative > > >> consumption > > >> > > >> Hi All, > > >> Please let me know if there is any other feedback or suggestions. > > >> > > >> Regards, > > >> Apoorv Mittal > > >> > > >> > > >> On Fri, Nov 1, 2024 at 5:30 PM Apoorv Mittal < > apoorvmitta...@gmail.com> > > >> wrote: > > >> > > >> > Thanks Jun and Andrew for reviewing and feedback. > > >> > > > >> > J1: I was thinking of having metrics as per the APIs in > > >> > SharePartitionManager but I think you are right. This is something > not > > >> > beneficial to track as the release should only happen when a client > > >> sends a > > >> > leave group request. I have removed this metric. > > >> > > > >> > J2: Moved to Histogram. > > >> > > > >> > J3: Moved to Meter. I have removed > RequestTopicPartitionsFetchEmptyCount > > >> > from KIP (as mentioned by Andrew - AS8) as > > >> RequestTopicPartitionsFetchRatio > > >> > moved to Histogram. Also I have marked InFlightMessageCount as > Meter, so > > >> > shall capture the rate better. > > >> > > > >> > AS0: Done, specified specifically in the Motivation section. > > >> > > > >> > AS1: Hmmm, I thought about that earlier and was thinking about > keeping > > >> the > > >> > abstracted share fetch APIs metrics in ShareGroupMetrics itself. > > >> However I > > >> > think the Share Fetch and Share Acknowledge can be part of > > >> BrokerTopicMetrics > > >> > as these operations happen by Share Consumer. Do you think we should > > >> also > > >> > move the RequestTopicPartitionsFetchRatio metric? I think we > shouldn't > > >> and > > >> > it should be in ShareGroupMetrics itself, please let me know what > you > > >> think. > > >> > > > >> > AS2: Yes you are right, it can support all topic rate as well. I > have > > >> > added the suggestion. > > >> > > > >> > AS3: I have removed this metric as part of J1 comment. > > >> > > > >> > AS4,AS6: We already have a share-acknowledgement Meter metric > defined in > > >> > KIP-932. I am of the opinion to move the KIP-932 > share-acknowledgement > > >> > metric to TotalShareAcknowledgementRequestsPerSec. Hence I have > added > > >> > corresponding FailedShareAcknowledgementRequestsPerSec in this KIP. > > >> > > > >> > AS5: It counts the failure of a share acknowledge request. Similar > to > > >> > multi topic partition fetch, acknowledge can also be multi topic > > >> partition > > >> > but a single acknowledge request might either fail completely or at > > >> topic > > >> > partition level. Hence this metric tracks the failure as similar to > > >> > FailedFetchRequestsPerSec or FailedShareFetchRequestsPerSec. > > >> > > > >> > AS7: I have added some examples and another metric FetchLockTimeMs, > > >> which > > >> > should help in optimizing behaviour. > > >> > > > >> > AS8: Yes, we might be able to work with low percentiles to > investigate > > >> > further hence I have removed RequestTopicPartitionsFetchEmptyCount. > > >> > However Histogram will give us percentiles but not the raw bucketed > > >> > value, in case we think we need such data explicitly then might > consider > > >> > adding it but for now it's removed. > > >> > > > >> > > > >> > Regards, > > >> > Apoorv Mittal > > >> > > > >> > > > >> > On Thu, Oct 31, 2024 at 1:03 PM Andrew Schofield < > > >> > andrew_schofield_j...@outlook.com> wrote: > > >> > > > >> >> Hi Apoorv, > > >> >> Thanks for the KIP. I have some comments. > > >> >> > > >> >> AS0: These are broker metrics and not client metrics. I don't > think the > > >> >> KIP states that and it probably should. > > >> >> > > >> >> AS1: TotalShareFetchRequestsPerSec is inspired by the existing > metric > > >> >> TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics > group > > >> >> so I would expect the new metric to be in the same group. This > comment > > >> >> applies to all of the ShareGroupMetrics broker metrics scoped to a > > >> topic. > > >> >> > > >> >> AS2: Similarly to TotalFetchRequestPerSec, does the new metric > support > > >> >> omitting the topic tag in order to obtain the all-topic rate? I > > >> believe so > > >> >> because it follows the existing metrics. I guess the tag is > actually > > >> >> `topic=([-.\w]+)` to copy the existing notation. > > >> >> > > >> >> AS3: What's a share release? Does it count releases which are made > > >> >> by ShareFetch/Acknowledge requests as well as implicit ones as a > > >> >> result of share session expiration or closure? > > >> >> > > >> >> AS4: I see there is FailedShareAcknowledgementRequestsPerSec. > > >> >> If this corresponds specifically to the ShareAcknowledge API, then > > >> >> the name should be FailedShareAcknowledgeRequestsPerSec. > > >> >> > > >> >> AS5: What does FailedShareAcknowledgementRequestsPerSec > > >> >> actually count? This is a per-topic metric and a ShareAcknowledge > > >> >> can mention several topics, but actually it can also operate on > many > > >> >> acquired records specifying any of the different acknowledgement > > >> >> types that KIP-932 defines. > > >> >> > > >> >> AS6: Why no TotalShareAcknowledgementRequestsPerSec? > > >> >> > > >> >> AS7: I think I like the concept behind > > >> RequestTopicPartitionsFetchRatio, > > >> >> but it would really help to include some examples of how it works. > You > > >> >> mention 4 potential causes for low ratios, so it would be nice to > see > > >> how > > >> >> each of them would be evident from the metrics and configurations. > > >> >> This is going to be valuable information for operating this in > > >> production. > > >> >> > > >> >> AS8: As Jun mentioned, maybe a histogram would be appropriate > > >> >> for RequestTopicPartitionsFetchRatio. Then I wonder whether it's > > >> >> worth having a separate RequestTopicPartitionsFetchEmptyCount since > > >> >> those are requests for which the fetch ratio was 0. > > >> >> > > >> >> Thanks, > > >> >> Andrew > > >> >> ________________________________________ > > >> >> From: Jun Rao <j...@confluent.io.INVALID> > > >> >> Sent: 30 October 2024 23:07 > > >> >> To: dev@kafka.apache.org <dev@kafka.apache.org> > > >> >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative > > >> >> consumption > > >> >> > > >> >> Hi, Apoorv, > > >> >> > > >> >> Thanks for the KIP. A few comments. > > >> >> > > >> >> 1. TotalShareReleaseRequestsPerSec: This is a bit weird since > there is > > >> >> no ShareReleaseRequest. > > >> >> > > >> >> 2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: > It > > >> >> seems > > >> >> Histogram is more appropriate for them. > > >> >> > > >> >> 3. RequestTopicPartitionsFetchEmptyCount and > > >> AcquisitionLockTimeoutCount: > > >> >> The problem with using gauge is that if the value changes quickly, > we > > >> may > > >> >> not be able to capture the occurrence of empty fetch or lock > timeout. > > >> >> Meter > > >> >> captures those rare events better. > > >> >> > > >> >> Jun > > >> >> > > >> >> > > >> >> On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal < > > >> apoorvmitta...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > Hi Everyone, > > >> >> > I would like to start a discussion on KIP-1103: > > >> >> > > > >> >> > > > >> >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption > > >> >> > > > >> >> > This KIP extends KIP-932 to provide additional metrics for > > >> >> > Queues/Cooperative consumption. > > >> >> > > > >> >> > Regards, > > >> >> > Apoorv Mittal > > >> >> > > > >> >> > > >> > > > >> > > > > > >