Re: [DISCUSS] KIP-1103: Additional metrics for cooperative consumption

Apoorv Mittal Tue, 05 Nov 2024 01:00:58 -0800

Hi All,
Please let me know if there is any other feedback or suggestions.

Regards,
Apoorv Mittal



On Fri, Nov 1, 2024 at 5:30 PM Apoorv Mittal <apoorvmitta...@gmail.com>
wrote:

> Thanks Jun and Andrew for reviewing and feedback.
>
> J1: I was thinking of having metrics as per the APIs in
> SharePartitionManager but I think you are right. This is something not
> beneficial to track as the release should only happen when a client sends a
> leave group request. I have removed this metric.
>
> J2: Moved to Histogram.
>
> J3: Moved to Meter. I have removed RequestTopicPartitionsFetchEmptyCount
> from KIP (as mentioned by Andrew - AS8) as RequestTopicPartitionsFetchRatio
> moved to Histogram. Also I have marked InFlightMessageCount as Meter, so
> shall capture the rate better.
>
> AS0: Done, specified specifically in the Motivation section.
>
> AS1: Hmmm, I thought about that earlier and was thinking about keeping the
> abstracted share fetch APIs metrics in ShareGroupMetrics itself. However I
> think the Share Fetch and Share Acknowledge can be part of BrokerTopicMetrics
> as these operations happen by Share Consumer. Do you think we should also
> move the RequestTopicPartitionsFetchRatio metric? I think we shouldn't and
> it should be in ShareGroupMetrics itself, please let me know what you think.
>
> AS2: Yes you are right, it can support all topic rate as well. I have
> added the suggestion.
>
> AS3: I have removed this metric as part of J1 comment.
>
> AS4,AS6: We already have a share-acknowledgement Meter metric defined in
> KIP-932. I am of the opinion to move the KIP-932 share-acknowledgement
> metric to TotalShareAcknowledgementRequestsPerSec. Hence I have added
> corresponding FailedShareAcknowledgementRequestsPerSec in this KIP.
>
> AS5: It counts the failure of a share acknowledge request. Similar to
> multi topic partition fetch, acknowledge can also be multi topic partition
> but a single acknowledge request might either fail completely or at topic
> partition level. Hence this metric tracks the failure as similar to
> FailedFetchRequestsPerSec or FailedShareFetchRequestsPerSec.
>
> AS7: I have added some examples and another metric FetchLockTimeMs, which
> should help in optimizing behaviour.
>
> AS8: Yes, we might be able to work with low percentiles to investigate
> further hence I have removed RequestTopicPartitionsFetchEmptyCount.
> However Histogram will give us percentiles but not the raw bucketed
> value, in case we think we need such data explicitly then might consider
> adding it but for now it's removed.
>
>
> Regards,
> Apoorv Mittal
>
>
> On Thu, Oct 31, 2024 at 1:03 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Apoorv,
>> Thanks for the KIP. I have some comments.
>>
>> AS0: These are broker metrics and not client metrics. I don't think the
>> KIP states that and it probably should.
>>
>> AS1: TotalShareFetchRequestsPerSec is inspired by the existing metric
>> TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics group
>> so I would expect the new metric to be in the same group. This comment
>> applies to all of the ShareGroupMetrics broker metrics scoped to a topic.
>>
>> AS2: Similarly to TotalFetchRequestPerSec, does the new metric support
>> omitting the topic tag in order to obtain the all-topic rate? I believe so
>> because it follows the existing metrics. I guess the tag is actually
>> `topic=([-.\w]+)` to copy the existing notation.
>>
>> AS3: What's a share release? Does it count releases which are made
>> by ShareFetch/Acknowledge requests as well as implicit ones as a
>> result of share session expiration or closure?
>>
>> AS4: I see there is FailedShareAcknowledgementRequestsPerSec.
>> If this corresponds specifically to the ShareAcknowledge API, then
>> the name should be FailedShareAcknowledgeRequestsPerSec.
>>
>> AS5: What does FailedShareAcknowledgementRequestsPerSec
>> actually count? This is a per-topic metric and a ShareAcknowledge
>> can mention several topics, but actually it can also operate on many
>> acquired records specifying any of the different acknowledgement
>> types that KIP-932 defines.
>>
>> AS6: Why no TotalShareAcknowledgementRequestsPerSec?
>>
>> AS7: I think I like the concept behind RequestTopicPartitionsFetchRatio,
>> but it would really help to include some examples of how it works. You
>> mention 4 potential causes for low ratios, so it would be nice to see how
>> each of them would be evident from the metrics and configurations.
>> This is going to be valuable information for operating this in production.
>>
>> AS8: As Jun mentioned, maybe a histogram would be appropriate
>> for RequestTopicPartitionsFetchRatio. Then I wonder whether it's
>> worth having a separate RequestTopicPartitionsFetchEmptyCount since
>> those are requests for which the fetch ratio was 0.
>>
>> Thanks,
>> Andrew
>> ________________________________________
>> From: Jun Rao <j...@confluent.io.INVALID>
>> Sent: 30 October 2024 23:07
>> To: dev@kafka.apache.org <dev@kafka.apache.org>
>> Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative
>> consumption
>>
>> Hi, Apoorv,
>>
>> Thanks for the KIP. A few comments.
>>
>> 1. TotalShareReleaseRequestsPerSec: This is a bit weird since there is
>> no ShareReleaseRequest.
>>
>> 2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: It
>> seems
>> Histogram is more appropriate for them.
>>
>> 3. RequestTopicPartitionsFetchEmptyCount and AcquisitionLockTimeoutCount:
>> The problem with using gauge is that if the value changes quickly, we may
>> not be able to capture the occurrence of empty fetch or lock timeout.
>> Meter
>> captures those rare events better.
>>
>> Jun
>>
>>
>> On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal <apoorvmitta...@gmail.com>
>> wrote:
>>
>> > Hi Everyone,
>> > I would like to start a discussion on KIP-1103:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption
>> >
>> > This KIP extends KIP-932 to provide additional metrics for
>> > Queues/Cooperative consumption.
>> >
>> > Regards,
>> > Apoorv Mittal
>> >
>>
>

Re: [DISCUSS] KIP-1103: Additional metrics for cooperative consumption

Reply via email to