Thanks for the feedback Andrew. AS9: Yes, you are correct it's topic name and not topic id.
AS10: Thanks, I have updated the description. Regards, Apoorv Mittal On Tue, Nov 5, 2024 at 10:56 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: > Hi Apoorv, > A couple of additional comments. > > AS9: The tag share-partition : {group-topic-partition} confirm the format. > I suppose it's > one tag `group-id:topic-name:partition`, and not topic ID. Topic ID is > clearly more > authoritative, but I don't think it's usable in metrics. > > AS10: The description of the AcquisitionLockTimeoutMs would be better as > "Tracks the number of acquisition locks for records which are not > acknowledged within the timeout." > > Thanks, > Andrew > > ________________________________________ > From: Apoorv Mittal <apoorvmitta...@gmail.com> > Sent: 05 November 2024 08:59 > To: dev@kafka.apache.org <dev@kafka.apache.org> > Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative > consumption > > Hi All, > Please let me know if there is any other feedback or suggestions. > > Regards, > Apoorv Mittal > > > On Fri, Nov 1, 2024 at 5:30 PM Apoorv Mittal <apoorvmitta...@gmail.com> > wrote: > > > Thanks Jun and Andrew for reviewing and feedback. > > > > J1: I was thinking of having metrics as per the APIs in > > SharePartitionManager but I think you are right. This is something not > > beneficial to track as the release should only happen when a client > sends a > > leave group request. I have removed this metric. > > > > J2: Moved to Histogram. > > > > J3: Moved to Meter. I have removed RequestTopicPartitionsFetchEmptyCount > > from KIP (as mentioned by Andrew - AS8) as > RequestTopicPartitionsFetchRatio > > moved to Histogram. Also I have marked InFlightMessageCount as Meter, so > > shall capture the rate better. > > > > AS0: Done, specified specifically in the Motivation section. > > > > AS1: Hmmm, I thought about that earlier and was thinking about keeping > the > > abstracted share fetch APIs metrics in ShareGroupMetrics itself. However > I > > think the Share Fetch and Share Acknowledge can be part of > BrokerTopicMetrics > > as these operations happen by Share Consumer. Do you think we should also > > move the RequestTopicPartitionsFetchRatio metric? I think we shouldn't > and > > it should be in ShareGroupMetrics itself, please let me know what you > think. > > > > AS2: Yes you are right, it can support all topic rate as well. I have > > added the suggestion. > > > > AS3: I have removed this metric as part of J1 comment. > > > > AS4,AS6: We already have a share-acknowledgement Meter metric defined in > > KIP-932. I am of the opinion to move the KIP-932 share-acknowledgement > > metric to TotalShareAcknowledgementRequestsPerSec. Hence I have added > > corresponding FailedShareAcknowledgementRequestsPerSec in this KIP. > > > > AS5: It counts the failure of a share acknowledge request. Similar to > > multi topic partition fetch, acknowledge can also be multi topic > partition > > but a single acknowledge request might either fail completely or at topic > > partition level. Hence this metric tracks the failure as similar to > > FailedFetchRequestsPerSec or FailedShareFetchRequestsPerSec. > > > > AS7: I have added some examples and another metric FetchLockTimeMs, which > > should help in optimizing behaviour. > > > > AS8: Yes, we might be able to work with low percentiles to investigate > > further hence I have removed RequestTopicPartitionsFetchEmptyCount. > > However Histogram will give us percentiles but not the raw bucketed > > value, in case we think we need such data explicitly then might consider > > adding it but for now it's removed. > > > > > > Regards, > > Apoorv Mittal > > > > > > On Thu, Oct 31, 2024 at 1:03 PM Andrew Schofield < > > andrew_schofield_j...@outlook.com> wrote: > > > >> Hi Apoorv, > >> Thanks for the KIP. I have some comments. > >> > >> AS0: These are broker metrics and not client metrics. I don't think the > >> KIP states that and it probably should. > >> > >> AS1: TotalShareFetchRequestsPerSec is inspired by the existing metric > >> TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics group > >> so I would expect the new metric to be in the same group. This comment > >> applies to all of the ShareGroupMetrics broker metrics scoped to a > topic. > >> > >> AS2: Similarly to TotalFetchRequestPerSec, does the new metric support > >> omitting the topic tag in order to obtain the all-topic rate? I believe > so > >> because it follows the existing metrics. I guess the tag is actually > >> `topic=([-.\w]+)` to copy the existing notation. > >> > >> AS3: What's a share release? Does it count releases which are made > >> by ShareFetch/Acknowledge requests as well as implicit ones as a > >> result of share session expiration or closure? > >> > >> AS4: I see there is FailedShareAcknowledgementRequestsPerSec. > >> If this corresponds specifically to the ShareAcknowledge API, then > >> the name should be FailedShareAcknowledgeRequestsPerSec. > >> > >> AS5: What does FailedShareAcknowledgementRequestsPerSec > >> actually count? This is a per-topic metric and a ShareAcknowledge > >> can mention several topics, but actually it can also operate on many > >> acquired records specifying any of the different acknowledgement > >> types that KIP-932 defines. > >> > >> AS6: Why no TotalShareAcknowledgementRequestsPerSec? > >> > >> AS7: I think I like the concept behind RequestTopicPartitionsFetchRatio, > >> but it would really help to include some examples of how it works. You > >> mention 4 potential causes for low ratios, so it would be nice to see > how > >> each of them would be evident from the metrics and configurations. > >> This is going to be valuable information for operating this in > production. > >> > >> AS8: As Jun mentioned, maybe a histogram would be appropriate > >> for RequestTopicPartitionsFetchRatio. Then I wonder whether it's > >> worth having a separate RequestTopicPartitionsFetchEmptyCount since > >> those are requests for which the fetch ratio was 0. > >> > >> Thanks, > >> Andrew > >> ________________________________________ > >> From: Jun Rao <j...@confluent.io.INVALID> > >> Sent: 30 October 2024 23:07 > >> To: dev@kafka.apache.org <dev@kafka.apache.org> > >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative > >> consumption > >> > >> Hi, Apoorv, > >> > >> Thanks for the KIP. A few comments. > >> > >> 1. TotalShareReleaseRequestsPerSec: This is a bit weird since there is > >> no ShareReleaseRequest. > >> > >> 2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: It > >> seems > >> Histogram is more appropriate for them. > >> > >> 3. RequestTopicPartitionsFetchEmptyCount and > AcquisitionLockTimeoutCount: > >> The problem with using gauge is that if the value changes quickly, we > may > >> not be able to capture the occurrence of empty fetch or lock timeout. > >> Meter > >> captures those rare events better. > >> > >> Jun > >> > >> > >> On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal <apoorvmitta...@gmail.com > > > >> wrote: > >> > >> > Hi Everyone, > >> > I would like to start a discussion on KIP-1103: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption > >> > > >> > This KIP extends KIP-932 to provide additional metrics for > >> > Queues/Cooperative consumption. > >> > > >> > Regards, > >> > Apoorv Mittal > >> > > >> > > >