Re: [DISCUSS] KIP-1103: Additional metrics for cooperative consumption

Apoorv Mittal Wed, 06 Nov 2024 05:58:55 -0800

Hi All,
Please let me know if there is any other feedback. Else will start a voting
thread tomorrow.


Regards,
Apoorv Mittal


On Tue, Nov 5, 2024 at 12:21 PM Apoorv Mittal <apoorvmitta...@gmail.com>
wrote:

> Thanks for the feedback Andrew.
>
> AS9: Yes, you are correct it's topic name and not topic id.
>
> AS10: Thanks, I have updated the description.
>
> Regards,
> Apoorv Mittal
>
>
> On Tue, Nov 5, 2024 at 10:56 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Apoorv,
>> A couple of additional comments.
>>
>> AS9: The tag share-partition : {group-topic-partition} confirm the
>> format. I suppose it's
>> one tag `group-id:topic-name:partition`, and not topic ID. Topic ID is
>> clearly more
>> authoritative, but I don't think it's usable in metrics.
>>
>> AS10: The description of the AcquisitionLockTimeoutMs would be better as
>> "Tracks the number of acquisition locks for records which are not
>> acknowledged within the timeout."
>>
>> Thanks,
>> Andrew
>>
>> ________________________________________
>> From: Apoorv Mittal <apoorvmitta...@gmail.com>
>> Sent: 05 November 2024 08:59
>> To: dev@kafka.apache.org <dev@kafka.apache.org>
>> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative
>> consumption
>>
>> Hi All,
>> Please let me know if there is any other feedback or suggestions.
>>
>> Regards,
>> Apoorv Mittal
>>
>>
>> On Fri, Nov 1, 2024 at 5:30 PM Apoorv Mittal <apoorvmitta...@gmail.com>
>> wrote:
>>
>> > Thanks Jun and Andrew for reviewing and feedback.
>> >
>> > J1: I was thinking of having metrics as per the APIs in
>> > SharePartitionManager but I think you are right. This is something not
>> > beneficial to track as the release should only happen when a client
>> sends a
>> > leave group request. I have removed this metric.
>> >
>> > J2: Moved to Histogram.
>> >
>> > J3: Moved to Meter. I have removed RequestTopicPartitionsFetchEmptyCount
>> > from KIP (as mentioned by Andrew - AS8) as
>> RequestTopicPartitionsFetchRatio
>> > moved to Histogram. Also I have marked InFlightMessageCount as Meter, so
>> > shall capture the rate better.
>> >
>> > AS0: Done, specified specifically in the Motivation section.
>> >
>> > AS1: Hmmm, I thought about that earlier and was thinking about keeping
>> the
>> > abstracted share fetch APIs metrics in ShareGroupMetrics itself.
>> However I
>> > think the Share Fetch and Share Acknowledge can be part of
>> BrokerTopicMetrics
>> > as these operations happen by Share Consumer. Do you think we should
>> also
>> > move the RequestTopicPartitionsFetchRatio metric? I think we shouldn't
>> and
>> > it should be in ShareGroupMetrics itself, please let me know what you
>> think.
>> >
>> > AS2: Yes you are right, it can support all topic rate as well. I have
>> > added the suggestion.
>> >
>> > AS3: I have removed this metric as part of J1 comment.
>> >
>> > AS4,AS6: We already have a share-acknowledgement Meter metric defined in
>> > KIP-932. I am of the opinion to move the KIP-932 share-acknowledgement
>> > metric to TotalShareAcknowledgementRequestsPerSec. Hence I have added
>> > corresponding FailedShareAcknowledgementRequestsPerSec in this KIP.
>> >
>> > AS5: It counts the failure of a share acknowledge request. Similar to
>> > multi topic partition fetch, acknowledge can also be multi topic
>> partition
>> > but a single acknowledge request might either fail completely or at
>> topic
>> > partition level. Hence this metric tracks the failure as similar to
>> > FailedFetchRequestsPerSec or FailedShareFetchRequestsPerSec.
>> >
>> > AS7: I have added some examples and another metric FetchLockTimeMs,
>> which
>> > should help in optimizing behaviour.
>> >
>> > AS8: Yes, we might be able to work with low percentiles to investigate
>> > further hence I have removed RequestTopicPartitionsFetchEmptyCount.
>> > However Histogram will give us percentiles but not the raw bucketed
>> > value, in case we think we need such data explicitly then might consider
>> > adding it but for now it's removed.
>> >
>> >
>> > Regards,
>> > Apoorv Mittal
>> >
>> >
>> > On Thu, Oct 31, 2024 at 1:03 PM Andrew Schofield <
>> > andrew_schofield_j...@outlook.com> wrote:
>> >
>> >> Hi Apoorv,
>> >> Thanks for the KIP. I have some comments.
>> >>
>> >> AS0: These are broker metrics and not client metrics. I don't think the
>> >> KIP states that and it probably should.
>> >>
>> >> AS1: TotalShareFetchRequestsPerSec is inspired by the existing metric
>> >> TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics group
>> >> so I would expect the new metric to be in the same group. This comment
>> >> applies to all of the ShareGroupMetrics broker metrics scoped to a
>> topic.
>> >>
>> >> AS2: Similarly to TotalFetchRequestPerSec, does the new metric support
>> >> omitting the topic tag in order to obtain the all-topic rate? I
>> believe so
>> >> because it follows the existing metrics. I guess the tag is actually
>> >> `topic=([-.\w]+)` to copy the existing notation.
>> >>
>> >> AS3: What's a share release? Does it count releases which are made
>> >> by ShareFetch/Acknowledge requests as well as implicit ones as a
>> >> result of share session expiration or closure?
>> >>
>> >> AS4: I see there is FailedShareAcknowledgementRequestsPerSec.
>> >> If this corresponds specifically to the ShareAcknowledge API, then
>> >> the name should be FailedShareAcknowledgeRequestsPerSec.
>> >>
>> >> AS5: What does FailedShareAcknowledgementRequestsPerSec
>> >> actually count? This is a per-topic metric and a ShareAcknowledge
>> >> can mention several topics, but actually it can also operate on many
>> >> acquired records specifying any of the different acknowledgement
>> >> types that KIP-932 defines.
>> >>
>> >> AS6: Why no TotalShareAcknowledgementRequestsPerSec?
>> >>
>> >> AS7: I think I like the concept behind
>> RequestTopicPartitionsFetchRatio,
>> >> but it would really help to include some examples of how it works. You
>> >> mention 4 potential causes for low ratios, so it would be nice to see
>> how
>> >> each of them would be evident from the metrics and configurations.
>> >> This is going to be valuable information for operating this in
>> production.
>> >>
>> >> AS8: As Jun mentioned, maybe a histogram would be appropriate
>> >> for RequestTopicPartitionsFetchRatio. Then I wonder whether it's
>> >> worth having a separate RequestTopicPartitionsFetchEmptyCount since
>> >> those are requests for which the fetch ratio was 0.
>> >>
>> >> Thanks,
>> >> Andrew
>> >> ________________________________________
>> >> From: Jun Rao <j...@confluent.io.INVALID>
>> >> Sent: 30 October 2024 23:07
>> >> To: dev@kafka.apache.org <dev@kafka.apache.org>
>> >> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative
>> >> consumption
>> >>
>> >> Hi, Apoorv,
>> >>
>> >> Thanks for the KIP. A few comments.
>> >>
>> >> 1. TotalShareReleaseRequestsPerSec: This is a bit weird since there is
>> >> no ShareReleaseRequest.
>> >>
>> >> 2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: It
>> >> seems
>> >> Histogram is more appropriate for them.
>> >>
>> >> 3. RequestTopicPartitionsFetchEmptyCount and
>> AcquisitionLockTimeoutCount:
>> >> The problem with using gauge is that if the value changes quickly, we
>> may
>> >> not be able to capture the occurrence of empty fetch or lock timeout.
>> >> Meter
>> >> captures those rare events better.
>> >>
>> >> Jun
>> >>
>> >>
>> >> On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal <
>> apoorvmitta...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Everyone,
>> >> > I would like to start a discussion on KIP-1103:
>> >> >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption
>> >> >
>> >> > This KIP extends KIP-932 to provide additional metrics for
>> >> > Queues/Cooperative consumption.
>> >> >
>> >> > Regards,
>> >> > Apoorv Mittal
>> >> >
>> >>
>> >
>>
>

Re: [DISCUSS] KIP-1103: Additional metrics for cooperative consumption

Reply via email to