Hi Apoorv,
Thanks for the KIP. I have some comments.

AS0: These are broker metrics and not client metrics. I don't think the
KIP states that and it probably should.

AS1: TotalShareFetchRequestsPerSec is inspired by the existing metric
TotalFetchRequestsPerSec. That one is in the BrokerTopicMetrics group
so I would expect the new metric to be in the same group. This comment
applies to all of the ShareGroupMetrics broker metrics scoped to a topic.

AS2: Similarly to TotalFetchRequestPerSec, does the new metric support
omitting the topic tag in order to obtain the all-topic rate? I believe so
because it follows the existing metrics. I guess the tag is actually
`topic=([-.\w]+)` to copy the existing notation.

AS3: What's a share release? Does it count releases which are made
by ShareFetch/Acknowledge requests as well as implicit ones as a
result of share session expiration or closure?

AS4: I see there is FailedShareAcknowledgementRequestsPerSec.
If this corresponds specifically to the ShareAcknowledge API, then
the name should be FailedShareAcknowledgeRequestsPerSec.

AS5: What does FailedShareAcknowledgementRequestsPerSec
actually count? This is a per-topic metric and a ShareAcknowledge
can mention several topics, but actually it can also operate on many
acquired records specifying any of the different acknowledgement
types that KIP-932 defines.

AS6: Why no TotalShareAcknowledgementRequestsPerSec?

AS7: I think I like the concept behind RequestTopicPartitionsFetchRatio,
but it would really help to include some examples of how it works. You
mention 4 potential causes for low ratios, so it would be nice to see how
each of them would be evident from the metrics and configurations.
This is going to be valuable information for operating this in production.

AS8: As Jun mentioned, maybe a histogram would be appropriate
for RequestTopicPartitionsFetchRatio. Then I wonder whether it's
worth having a separate RequestTopicPartitionsFetchEmptyCount since
those are requests for which the fetch ratio was 0.

Thanks,
Andrew
________________________________________
From: Jun Rao <j...@confluent.io.INVALID>
Sent: 30 October 2024 23:07
To: dev@kafka.apache.org <dev@kafka.apache.org>
Subject: Re: [DISCUSS] KIP-1103: Additional metrics for cooperative consumption

Hi, Apoorv,

Thanks for the KIP. A few comments.

1. TotalShareReleaseRequestsPerSec: This is a bit weird since there is
no ShareReleaseRequest.

2. RequestTopicPartitionsFetchRatio and InFlightBatchMessageCount: It seems
Histogram is more appropriate for them.

3. RequestTopicPartitionsFetchEmptyCount and AcquisitionLockTimeoutCount:
The problem with using gauge is that if the value changes quickly, we may
not be able to capture the occurrence of empty fetch or lock timeout. Meter
captures those rare events better.

Jun


On Tue, Oct 29, 2024 at 8:48 AM Apoorv Mittal <apoorvmitta...@gmail.com>
wrote:

> Hi Everyone,
> I would like to start a discussion on KIP-1103:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption
>
> This KIP extends KIP-932 to provide additional metrics for
> Queues/Cooperative consumption.
>
> Regards,
> Apoorv Mittal
>

Reply via email to