Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

Qichao Chu Tue, 28 Nov 2023 19:22:54 -0800

Hi Vahid,

Thank you for taking the time to review the KIP and asking great questions.


The execution path mentioned is exactly how this KIP is going to function.
We believe this config, compared with many other configurations like quota,
will not pose significant alteration to the broker's functionality. Thus
overwriting the permanent config dynamically is acceptable. After thinking
about this topic again, I think your question makes a lot of sense and we
shouldn't override the permanent config. Permanent config usually
represents the normal operation condition, so the temporal config should be
removed after the cluster is restarted/re-provisioned to be 'clean'. This
also helps to prevent undesirable behavior and to reduce the risk involved
in operation. I have updated the KIP to mention this behavior.

`Medium` level is not mentioned as we don't have any medium-level metrics
for now. This is for future extension. I have also updated the KIP to
reflect this information.

Best,
Qichao



On Tue, Nov 28, 2023 at 9:22 AM Vahid Hashemian <[email protected]> wrote:

> Hi Qichao,
>
> Thanks for proposing this KIP. It'd be super valuable to have the ability
> to have those partition level metrics for Kafka topics.
>
> Sorry I'm late to the discussion. I just wanted to bring up a point
> for clarification and one question:
>
> Let's assume that a production cluster cannot afford to enable high
> verbosity on a permanent basis (at least not for all topics) due to
> performance concerns.
>
> Since this new config can be set dynamically, in case of an issue or
> investigation that warrants obtaining partition level metrics, one can
> simply enable high verbosity for select topic(s), temporarily collect
> metrics at partition level, and then change the config back to the previous
> setting. Since the config values are not set incrementally, the operator
> would need to run a `describe` to get the existing config first, and then
> amend it to enable high verbosity for the topic(s) of interest. Finally,
> when the investigation concludes, the config has to be reverted to its
> permanent setting.
>
> If the above execution path makes sense, in case the operator forgets to
> take an inventory of the existing (permanent) config and simply overwrites
> it, then that permanent config will be gone and not retrievable. Is this
> correct?
>
> We usually don't need to temporarily change broker configs and I see this
> config as one that can be temporarily changed. So keeping track of what the
> value was before the change is rather important.
>
> Aside from this point, my question is: What's the impact of `medium`
> setting for `level`? I couldn't find it described in the KIP.
>
> Thanks!
> --Vahid
>
>
>
> On Mon, Nov 13, 2023 at 5:34 AM Divij Vaidya <[email protected]>
> wrote:
>
> > Thank you for updating the KIP Qichao.
> >
> > I don't have any more questions or suggestions. Looks good to move
> forward
> > from my perspective.
> >
> >
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Nov 10, 2023 at 2:25 PM Qichao Chu <[email protected]>
> > wrote:
> >
> > > Thank you again for the nice suggestions, Jorge!
> > > I will wait for Divij's response and move it to the vote stage once the
> > > generic filter part reached concensus.
> > >
> > > Qichao Chu
> > > Software Engineer | Data - Kafka
> > > [image: Uber] <https://uber.com/>
> > >
> > >
> > > On Fri, Nov 10, 2023 at 6:49 AM Jorge Esteban Quilcate Otoya <
> > > [email protected]> wrote:
> > >
> > > > Hi Qichao,
> > > >
> > > > Thanks for updating the KIP, all updates look good to me.
> > > >
> > > > Looking forward to see this KIP moving forward!
> > > >
> > > > Cheers,
> > > > Jorge.
> > > >
> > > >
> > > >
> > > > On Wed, 8 Nov 2023 at 08:55, Qichao Chu <[email protected]>
> > wrote:
> > > >
> > > > > Hi Divij,
> > > > >
> > > > > Thank you for the feedback. I updated the KIP to make it a little
> bit
> > > > more
> > > > > generic: filters will stay in an array instead of different
> top-level
> > > > > objects. In this way, if we need language filters in the future.
> The
> > > > logic
> > > > > relationship of filters is also added.
> > > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thank you for the review and great comments. Here is the reply for
> > each
> > > > of
> > > > > the suggestions:
> > > > >
> > > > > 1) The words describing the property are now updated to include
> more
> > > > > details of the keys in the JSON. It also explicitly mentions the
> JSON
> > > > > nature of the config now.
> > > > > 2) The JSON entries should be non-conflict so the order is not
> > > relevant.
> > > > If
> > > > > there's conflict, the conflict resolution rules are stated in the
> > KIP.
> > > To
> > > > > make it more clear, ordering and duplication rules are updated in
> the
> > > > > Restrictions section of the *level* property.
> > > > > 3) Yeah we did take a look at the RecordingLevel config and it does
> > not
> > > > > work for this case. The RecodingLevel config does not offer the
> > > > capability
> > > > > of filtering and it has a drawback of needing to be added to all
> the
> > > > future
> > > > > sensors. To reduce the duplication, I propose we merge the
> > > RecordingLevel
> > > > > to this more generic config in the future. Please take a look into
> > the
> > > > > *Using
> > > > > the Existing RecordingLevel Config* section under *Rejected
> > > Alternatives*
> > > > > for more details.
> > > > > 4) This suggestion makes a lot of sense. My idea is to create a
> > > > > table/form/doc in the documentation for the verbosity levels of all
> > > > metric
> > > > > series. If it's too verbose to be in the docs, I will update the
> KIP
> > to
> > > > > include this info. I will create a JIRA for this effort once the
> KIP
> > is
> > > > > approved.
> > > > > 5) Sure we can expand to all other series, added to the KIP.
> > > > > 6) Added a new section(*Working with the Configuration via CLI)*
> with
> > > the
> > > > > user experience details
> > > > > 7) Links are updated.
> > > > >
> > > > > Please take another look and let me know if you have any more
> > concerns.
> > > > >
> > > > > Best,
> > > > > Qichao Chu
> > > > > Software Engineer | Data - Kafka
> > > > > [image: Uber] <https://uber.com/>
> > > > >
> > > > >
> > > > > On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi Qichao,
> > > > > >
> > > > > > Thanks for the KIP! This will be a valuable contribution and
> > improve
> > > > the
> > > > > > tooling for troubleshooting.
> > > > > >
> > > > > > I have a couple of comments:
> > > > > >
> > > > > > 1. It's unclear from the `metrics.verbosity` description what the
> > > > > supported
> > > > > > values are. In the description mentions "If the value is high ...
> > In
> > > > the
> > > > > > low settings" but I think it's referring to the `level` property
> > > > > > specifically instead of the whole value that is now JSON. Could
> you
> > > > > clarify
> > > > > > this?
> > > > > >
> > > > > > 2. Could we state in which order the JSON entries are going to be
> > > > > > evaluated? I guess the last entry wins if it overlaps previous
> > > values,
> > > > > but
> > > > > > better to make this explicit.
> > > > > >
> > > > > > 3. Kafka metrics library has a `RecordingLevel` configuration --
> > have
> > > > we
> > > > > > considered aligning these concepts and maybe reuse it instead of
> > > > > > `verbosityLevel`? Then we can reuse the levels: INFO, DEBUG,
> TRACE.
> > > > > >
> > > > > > 4. Not sure if within the scope of the KIP, but would be helpful
> to
> > > > > > document the metrics with the verbosity level attached to the
> > > metrics.
> > > > > > Maybe creating a JIRA ticket to track this would be enough if we
> > > can't
> > > > > > cover it as part of the KIP.
> > > > > >
> > > > > > 5. Could we consider the following client-related metrics as
> well:
> > > > > >   - BytesRejectedPerSec
> > > > > >   - TotalProduceRequestsPerSec
> > > > > >   - TotalFetchRequestsPerSec
> > > > > >   - FailedProduceRequestsPerSec
> > > > > >   - FailedFetchRequestsPerSec
> > > > > >   - FetchMessageConversionsPerSec
> > > > > >   - ProduceMessageConversionsPerSec
> > > > > > Would be great to have these from day 1 instead of requiring a
> > > > following
> > > > > > KIP to extend this. Could be implemented in separate PRs if
> needed.
> > > > > >
> > > > > > 6. To make it clearer how the user experience would be, could we
> > > > provide
> > > > > an
> > > > > > example of:
> > > > > > - how the broker configuration will be provided by default, and
> > > > > > - how the CLI tooling would be used to change the configuration?
> > > > > > - Maybe a couple of scenarios: adding a new metric config, a
> second
> > > one
> > > > > > with overlapping values, and
> > > > > > - describing the expected metrics to be mapped
> > > > > >
> > > > > > A couple of nits:
> > > > > > - The first link "MessagesInPerSec metrics" is pointing to
> > > > > > https://kafka.apache.org/documentation/#uses_metrics -- is this
> > the
> > > > > > correct
> > > > > > reference? It doesn't seem too relevant.
> > > > > > - Also, the link to ReplicaManager points to a line that has
> change
> > > > > > already; better to have a permalink to a specific commit: e.g.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/edc7e10a745c350ad1efa9e4866370dc8ea0e034/core/src/main/scala/kafka/server/ReplicaManager.scala#L1218
> > > > > >
> > > > > > Cheers,
> > > > > > Jorge.
> > > > > >
> > > > > > On Tue, 7 Nov 2023 at 17:06, Qichao Chu <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Divij,
> > > > > > >
> > > > > > > It would be very nice if you could take a look at the recent
> > > changes,
> > > > > > thank
> > > > > > > you!
> > > > > > > If there's no more required changes, shall we move to vote
> stage?
> > > > > > >
> > > > > > > Best,
> > > > > > > Qichao Chu
> > > > > > > Software Engineer | Data - Kafka
> > > > > > > [image: Uber] <https://uber.com/>
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 2, 2023 at 12:06 AM Qichao Chu <[email protected]>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Divij,
> > > > > > > >
> > > > > > > > Thank you for the very quick response and the nice
> > suggestions. I
> > > > > have
> > > > > > > > updated the KIP with the following thoughts.
> > > > > > > >
> > > > > > > > 1. I checked the Java documentation and it seems the regex
> > engine
> > > > in
> > > > > > > utils
> > > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
> > > > > > >
> > > > > > > is
> > > > > > > > not 100% compatible with PCRE, though it is very close. I
> > stated
> > > > > > > > the Java implementation as the requirement since we are most
> > > likely
> > > > > to
> > > > > > > > target a JVM language.
> > > > > > > > 2. Agreed with the filter limitation. For now, let's keep it
> > > topic
> > > > > > only.
> > > > > > > > With that in mind, I feel we do have cases where a user wants
> > to
> > > > list
> > > > > > > many
> > > > > > > > topics. Although regex is also possible, an array will make
> > > things
> > > > > > > faster.
> > > > > > > > This makes me add two options for the topic filter.
> > > > > > > > 3. It seems not many configs are using JSON, this was the
> > > intention
> > > > > for
> > > > > > > me
> > > > > > > > to use a compound string. However since JSON is used widely
> in
> > > the
> > > > > > > project,
> > > > > > > > and given the benefits you mentioned earlier, I tried to make
> > the
> > > > > > config
> > > > > > > a
> > > > > > > > JSON array. The change is to make it compatible with
> > multi-level
> > > > > > > settings.
> > > > > > > >
> > > > > > > > Let me know what you think. Many thanks!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Qichao Chu
> > > > > > > > Software Engineer | Data - Kafka
> > > > > > > > [image: Uber] <https://uber.com/>
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Nov 1, 2023 at 9:43 PM Divij Vaidya <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thank you for making the changes Qichao.
> > > > > > > >>
> > > > > > > >> We are now entering in the territory of defining a
> declarative
> > > > > schema
> > > > > > > for
> > > > > > > >> filters. In the new input format, the type is string but we
> > are
> > > > > > > imposing a
> > > > > > > >> schema for the string and we should clearly call out the
> > schema.
> > > > You
> > > > > > can
> > > > > > > >> perhaps choose to adopt a schema such as below:
> > > > > > > >>
> > > > > > > >> metricLevel = High | Low (default: Low)
> > > > > > > >> metricNameRegEx = regEx (default: .*)
> > > > > > > >> nameOfDimension = string
> > > > > > > >> dimensionRegEx = regEx
> > > > > > > >> dimensionFilter = [<nameOfDimension>=<dimensionRegEx>]
> > (default:
> > > > [])
> > > > > > > >>
> > > > > > > >> Final Value schema = "level"=$metricLevel,
> > > > "name"=$metricNameRegEx,
> > > > > > > >> $dimensionFilter
> > > > > > > >>
> > > > > > > >> Further we need to answer questions such as :
> > > > > > > >> 1. which regEx format do we support (it should probably be
> > > > > > > Perl-compatible
> > > > > > > >> regular expressions (PCRE) because Java's regEx is
> compatible
> > > with
> > > > > it)
> > > > > > > >> 2. should we restrict the dimensionFilter to at max length 1
> > and
> > > > > value
> > > > > > > >> "topic" only for now. Later when we want to expand, we can
> > > expand
> > > > > > > filters
> > > > > > > >> for other dimensions as well such as partitions.
> > > > > > > >> 3. if we are coming up with our stringified-schema, why not
> > use
> > > > > json?
> > > > > > It
> > > > > > > >> would save us from building a parsing utility for the
> schema.
> > (I
> > > > > like
> > > > > > it
> > > > > > > >> in
> > > > > > > >> its current format but there is a case to be made for json
> as
> > > > well)
> > > > > > > >> 4. what happens when there are contradictory regEx rules,
> > e.g. a
> > > > > topic
> > > > > > > >> defined in high as well as low. It is generally solved by
> > > defining
> > > > > > > >> precedence. In our case, we can choose that high has more
> > > > precedence
> > > > > > > than
> > > > > > > >> low.
> > > > > > > >>
> > > > > > > >> What do you think?
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Divij Vaidya
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Wed, Nov 1, 2023 at 2:07 PM Qichao Chu
> > > <[email protected]
> > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Divij,
> > > > > > > >> >
> > > > > > > >> > Thank you for the review and the great suggestions,
> again. I
> > > > have
> > > > > > > >> updated
> > > > > > > >> > the corresponding content, can you take another look?
> > > > > > > >> > Regarding the KIP-544 style regex, I have added it to the
> > new
> > > > > > property
> > > > > > > >> too.
> > > > > > > >> > It's expanded to include multiple sections for better
> future
> > > > > > > extension.
> > > > > > > >> >
> > > > > > > >> > Best,
> > > > > > > >> > Qichao Chu
> > > > > > > >> > Software Engineer | Data - Kafka
> > > > > > > >> > [image: Uber] <https://uber.com/>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Mon, Oct 30, 2023 at 6:26 PM Divij Vaidya <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hey *Qichao*
> > > > > > > >> > >
> > > > > > > >> > > Thank you for the update on the KIP. I like the idea of
> > > > > > incremental
> > > > > > > >> > > delivery and adding which metrics support this verbosity
> > as
> > > a
> > > > > > later
> > > > > > > >> KIP.
> > > > > > > >> > > But I also want to ensure that we wouldn't have to
> change
> > > the
> > > > > > > current
> > > > > > > >> > > config when adding that in future. Hence, we need some
> > > > > discussion
> > > > > > on
> > > > > > > >> it
> > > > > > > >> > in
> > > > > > > >> > > the scope of the KIP.
> > > > > > > >> > >
> > > > > > > >> > > About the dynamic configuration:
> > > > > > > >> > > Do we need to add the "default" mode? I am asking
> because
> > it
> > > > may
> > > > > > > >> inhibit
> > > > > > > >> > us
> > > > > > > >> > > from adding the allowList option in future. Instead if
> we
> > > > could
> > > > > > > >> rephrase
> > > > > > > >> > > the config as: "metric.verbosity.high" which takes
> values
> > > as a
> > > > > > regEx
> > > > > > > >> > > (default will be empty), then we wouldn't have to worry
> > > about
> > > > > > > >> > > future-proofness of this KIP. Notably this is an
> existing
> > > > > pattern
> > > > > > > >> used by
> > > > > > > >> > > KIP-544.
> > > > > > > >> > > Alternatively, if you choose to stick to the current
> > > > > configuration
> > > > > > > >> > pattern,
> > > > > > > >> > > please provide information on how this config will look
> > like
> > > > > when
> > > > > > we
> > > > > > > >> add
> > > > > > > >> > > allow listing in future.
> > > > > > > >> > >
> > > > > > > >> > > About the perf test:
> > > > > > > >> > > Motivation - The motivation of perf test is to provide
> > users
> > > > > with
> > > > > > a
> > > > > > > >> hint
> > > > > > > >> > on
> > > > > > > >> > > what perf penalty they can expect and whether default
> has
> > > > > degraded
> > > > > > > >> perf
> > > > > > > >> > > (due to additional "empty" labels).
> > > > > > > >> > > Dimensions of the test could be - scrape interval,
> > > utilization
> > > > > of
> > > > > > > >> broker
> > > > > > > >> > > (no traffic vs. heavy traffic), number of partitions
> > > > (small/200
> > > > > to
> > > > > > > >> > > large/2k).
> > > > > > > >> > > Things to collect during perf test - number of mbeans
> > > > registered
> > > > > > > with
> > > > > > > >> > JMX,
> > > > > > > >> > > CPU, heap utilization
> > > > > > > >> > > Expected results - As long as we can prove that there is
> > no
> > > > > > > additional
> > > > > > > >> > > usage (significant) of CPU or heap after this change for
> > the
> > > > > > > "default
> > > > > > > >> > > mode", we should be good. For the "high" mode, we should
> > > > > document
> > > > > > > the
> > > > > > > >> > > expected increase for users but it is not a blocker to
> > > > implement
> > > > > > > this
> > > > > > > >> > KIP.
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > *Kirk*, I have tried to clarify the expectation on
> > > > performance,
> > > > > > does
> > > > > > > >> that
> > > > > > > >> > > address your question earlier? Also, I am happy with
> > having
> > > a
> > > > > > Kafka
> > > > > > > >> level
> > > > > > > >> > > dynamic config that we can use to filter our
> > > > > metric/dimensionality
> > > > > > > >> since
> > > > > > > >> > we
> > > > > > > >> > > have a precedence at KIP-544. Hence, my suggestion to
> push
> > > > this
> > > > > > > >> filtering
> > > > > > > >> > > to metric library can be ignored.
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Divij Vaidya
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Sat, Oct 28, 2023 at 11:37 AM Qichao Chu
> > > > > > <[email protected]
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hello Everyone,
> > > > > > > >> > > >
> > > > > > > >> > > > Can I ask for some feedback regarding KIP-977
> > > > > > > >> > > > <
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> > > > > > > >> > > > >
> > > > > > > >> > > > ?
> > > > > > > >> > > >
> > > > > > > >> > > > Best,
> > > > > > > >> > > > Qichao Chu
> > > > > > > >> > > > Software Engineer | Data - Kafka
> > > > > > > >> > > > [image: Uber] <https://uber.com/>
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > On Mon, Oct 16, 2023 at 7:34 PM Qichao Chu <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi Divij and Kirk,
> > > > > > > >> > > > >
> > > > > > > >> > > > > Thank you both for providing the valuable feedback
> and
> > > > sorry
> > > > > > for
> > > > > > > >> the
> > > > > > > >> > > > > delay. I have just updated the KIP to address the
> > > > comments.
> > > > > > > >> > > > >
> > > > > > > >> > > > >    1. Instead of using a topic-level control, global
> > > > > verbosity
> > > > > > > >> > control
> > > > > > > >> > > > >    makes more sense if we want to extend it in the
> > > future.
> > > > > It
> > > > > > > >> would
> > > > > > > >> > be
> > > > > > > >> > > > very
> > > > > > > >> > > > >    difficult if we want to apply the topic allowlist
> > > > > > everywhere
> > > > > > > >> > > > >    2. Also, the topic allowlist was not dynamic
> which
> > > > makes
> > > > > > > >> > everything
> > > > > > > >> > > > >    quite complex, especially for the topic lifecycle
> > > > > > management.
> > > > > > > >> By
> > > > > > > >> > > > using the
> > > > > > > >> > > > >    dynamic global config, debugging could be easier,
> > and
> > > > > > > >> management
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > >    config is also made easier.
> > > > > > > >> > > > >    3. More details are included in the test section.
> > > > > > > >> > > > >
> > > > > > > >> > > > > One thing that still misses is the performance
> > numbers.
> > > I
> > > > > will
> > > > > > > >> get it
> > > > > > > >> > > > > ready with our internal clusters and share out soon.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Many thanks for the review!
> > > > > > > >> > > > > Qichao
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Tue, Sep 12, 2023 at 8:31 AM Kirk True <
> > > > > [email protected]>
> > > > > > > >> wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > >> Oh, and does
> metrics.partition.level.reporting.topics
> > > > allow
> > > > > > for
> > > > > > > >> > regex?
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> > On Sep 12, 2023, at 8:26 AM, Kirk True <
> > > > > [email protected]>
> > > > > > > >> wrote:
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > Hi Qichao,
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > Thanks for the KIP!
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > Divij—questions/comments inline...
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> >> On Sep 11, 2023, at 4:32 AM, Divij Vaidya <
> > > > > > > >> > [email protected]
> > > > > > > >> > > >
> > > > > > > >> > > > >> wrote:
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> Thank you for the proposal Qichao.
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> I agree with the motivation here and understand
> > the
> > > > > > tradeoff
> > > > > > > >> here
> > > > > > > >> > > > >> >> between observability vs. increased metric
> > > dimensions
> > > > > > > (metric
> > > > > > > >> > > fan-out
> > > > > > > >> > > > >> >> as you say in the KIP).
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> High level comments:
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> 1. I would urge you to consider the
> extensibility
> > of
> > > > the
> > > > > > > >> proposal
> > > > > > > >> > > for
> > > > > > > >> > > > >> >> other types of metrics. Tomorrow, if we want to
> > > > > > selectively
> > > > > > > >> add
> > > > > > > >> > > > >> >> "partition" dimension to another metric, would
> we
> > > have
> > > > > to
> > > > > > > >> modify
> > > > > > > >> > > the
> > > > > > > >> > > > >> >> code where each metric is emitted?
> Alternatively,
> > > > could
> > > > > we
> > > > > > > >> > abstract
> > > > > > > >> > > > >> >> out this config in a "Kafka Metrics" library.
> The
> > > code
> > > > > > > >> provides
> > > > > > > >> > all
> > > > > > > >> > > > >> >> information about this library and this library
> > can
> > > > > choose
> > > > > > > >> which
> > > > > > > >> > > > >> >> dimensions it wants to add to the final metrics
> > that
> > > > are
> > > > > > > >> emitted
> > > > > > > >> > > > based
> > > > > > > >> > > > >> >> on declarative configuration.
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > I’d agree with this if it doesn’t place a burden
> on
> > > the
> > > > > > > >> callers.
> > > > > > > >> > Are
> > > > > > > >> > > > >> there any potential call sites that don’t have the
> > > > > partition
> > > > > > > >> > > information
> > > > > > > >> > > > >> readily available?
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> >> 2. Can we offload the handling of this dimension
> > > > > filtering
> > > > > > > to
> > > > > > > >> the
> > > > > > > >> > > > >> >> metric framework? Have you explored whether
> > > prometheus
> > > > > or
> > > > > > > >> other
> > > > > > > >> > > > >> >> libraries provide the ability to dynamically
> > change
> > > > > > > dimensions
> > > > > > > >> > > > >> >> associated with metrics?
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > I’m not familiar with the downstream metrics
> > > providers’
> > > > > > > >> > > capabilities.
> > > > > > > >> > > > >> This is a greatest common denominator scenario,
> > right?
> > > > We’d
> > > > > > > have
> > > > > > > >> to
> > > > > > > >> > be
> > > > > > > >> > > > >> reasonable sure that the heavily used providers
> *all*
> > > > > support
> > > > > > > >> such
> > > > > > > >> > > > dynamic
> > > > > > > >> > > > >> filtering.
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > Also—and correct me as needed as I’m not familiar
> > > with
> > > > > the
> > > > > > > >> area—if
> > > > > > > >> > > we
> > > > > > > >> > > > >> relegate partition filtering to a lower layer, we’d
> > > still
> > > > > > need
> > > > > > > to
> > > > > > > >> > > store
> > > > > > > >> > > > the
> > > > > > > >> > > > >> metric data in memory until it’s flushed, yes? If
> so,
> > > is
> > > > > that
> > > > > > > >> > overhead
> > > > > > > >> > > > of
> > > > > > > >> > > > >> any concern?
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> >> Implementation level comments:
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> 1. In the test plan section, please mention what
> > > kind
> > > > of
> > > > > > > integ
> > > > > > > >> > > and/or
> > > > > > > >> > > > >> >> unit tests will be added and what they will
> > assert.
> > > As
> > > > > an
> > > > > > > >> > example,
> > > > > > > >> > > > you
> > > > > > > >> > > > >> >> can add a section, "functionality tests", which
> > > would
> > > > > > assert
> > > > > > > >> that
> > > > > > > >> > > new
> > > > > > > >> > > > >> >> metric config is being respected and another
> > > section,
> > > > > > > >> > "performance
> > > > > > > >> > > > >> >> tests", which could be a system test and assert
> > that
> > > > no
> > > > > > > >> > regression
> > > > > > > >> > > > >> >> caused wrt resources occupied by metrics from
> one
> > > > > version
> > > > > > to
> > > > > > > >> > > another.
> > > > > > > >> > > > >> >> 2. Please mention why or why not are we
> > considering
> > > > > > > >> dynamically
> > > > > > > >> > > > >> >> setting the configuration (i.e. without a broker
> > > > > > restart)? I
> > > > > > > >> > would
> > > > > > > >> > > > >> >> imagine that the ability to dynamically
> configure
> > > for
> > > > a
> > > > > > > >> specific
> > > > > > > >> > > > topic
> > > > > > > >> > > > >> >> will be very useful especially to debug
> production
> > > > > > > situations
> > > > > > > >> > that
> > > > > > > >> > > > you
> > > > > > > >> > > > >> >> mention in the motivation.
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > +1
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> >> 3. You mention that we want to start with
> metrics
> > > > > closely
> > > > > > > >> related
> > > > > > > >> > > to
> > > > > > > >> > > > >> >> producer & consumers first, which is fair. Could
> > you
> > > > > > please
> > > > > > > >> add a
> > > > > > > >> > > > >> >> statement on the work required to extend this to
> > > other
> > > > > > > >> metrics in
> > > > > > > >> > > > >> >> future?
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > +1
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> >> 4. In the compatibility section, you mention
> that
> > > this
> > > > > > > change
> > > > > > > >> is
> > > > > > > >> > > > >> >> backward compatible. I don't fully understand
> > that.
> > > > > > During a
> > > > > > > >> > > version
> > > > > > > >> > > > >> >> upgrade, we will start with an empty list of
> > topics
> > > to
> > > > > > > >> maintain
> > > > > > > >> > > > >> >> backward compatibility. I assume after the
> > upgrade,
> > > we
> > > > > > will
> > > > > > > >> > update
> > > > > > > >> > > > the
> > > > > > > >> > > > >> >> new config with topic names that we desire to
> > > monitor.
> > > > > But
> > > > > > > >> > updating
> > > > > > > >> > > > >> >> the config will require a broker restart (a
> > rolling
> > > > > > restart
> > > > > > > >> since
> > > > > > > >> > > > >> >> config is read-only). We will be in a situation
> > > where
> > > > > some
> > > > > > > >> > brokers
> > > > > > > >> > > > are
> > > > > > > >> > > > >> >> sending metrics with a new "partition" dimension
> > and
> > > > > some
> > > > > > > >> brokers
> > > > > > > >> > > are
> > > > > > > >> > > > >> >> sending metrics with no partition dimension. Is
> > that
> > > > > > > >> acceptable
> > > > > > > >> > to
> > > > > > > >> > > > JMX
> > > > > > > >> > > > >> >> / prometheus collectors? Would it break them?
> > Please
> > > > > > clarify
> > > > > > > >> how
> > > > > > > >> > > > >> >> upgrades will work in the compatibility section.
> > > > > > > >> > > > >> >> 5. Could you please quantify (with an
> experiment)
> > > the
> > > > > > > expected
> > > > > > > >> > perf
> > > > > > > >> > > > >> >> impact of adding the partition dimension? This
> > could
> > > > be
> > > > > > done
> > > > > > > >> as
> > > > > > > >> > > part
> > > > > > > >> > > > >> >> of "test plan" section and would serve as a data
> > > point
> > > > > for
> > > > > > > >> users
> > > > > > > >> > to
> > > > > > > >> > > > >> >> understand the potential impact if they decide
> to
> > > turn
> > > > > it
> > > > > > > on.
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > Is there some guidance on the level of precision
> > and
> > > > > detail
> > > > > > > >> > expected
> > > > > > > >> > > > >> when providing the performance numbers in the KIP?
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > This notion of proving out the performance impact
> > is
> > > > > > > >> important, I
> > > > > > > >> > > > >> agree. Anecdotally, there was another KIP I was
> > > following
> > > > > for
> > > > > > > >> which
> > > > > > > >> > > > >> performance numbers were requested, as is
> reasonable.
> > > But
> > > > > > that
> > > > > > > >> > caused
> > > > > > > >> > > > the
> > > > > > > >> > > > >> KIP to go a bit sideways as a result because it
> > wasn’t
> > > > able
> > > > > > to
> > > > > > > >> get
> > > > > > > >> > > > >> consensus on a) the different scenarios to test,
> and
> > b)
> > > > the
> > > > > > > >> > > quantitative
> > > > > > > >> > > > >> goal for each. I’m not really sure the rigo(u)r
> > that’s
> > > > > > expected
> > > > > > > >> at
> > > > > > > >> > > this
> > > > > > > >> > > > >> stage in the development of a new feature.
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > Thanks,
> > > > > > > >> > > > >> > Kirk
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> --
> > > > > > > >> > > > >> >> Divij Vaidya
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >>
> > > > > > > >> > > > >> >> On Sat, Sep 9, 2023 at 8:18 PM Qichao Chu
> > > > > > > >> > <[email protected]
> > > > > > > >> > > >
> > > > > > > >> > > > >> wrote:
> > > > > > > >> > > > >> >>>
> > > > > > > >> > > > >> >>> Hi All,
> > > > > > > >> > > > >> >>>
> > > > > > > >> > > > >> >>> Although this has been discussed many times, I
> > > would
> > > > > like
> > > > > > > to
> > > > > > > >> > > start a
> > > > > > > >> > > > >> new
> > > > > > > >> > > > >> >>> discussion regarding the introduction of
> > > > > partition-level
> > > > > > > >> > > throughput
> > > > > > > >> > > > >> >>> metrics. Please review the KIP and I'm eager to
> > > know
> > > > > > > >> everyone's
> > > > > > > >> > > > >> thoughts:
> > > > > > > >> > > > >> >>>
> > > > > > > >> > > > >>
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> > > > > > > >> > > > >> >>>
> > > > > > > >> > > > >> >>> TL;DR: The KIP proposes to add partition-level
> > > > > throughput
> > > > > > > >> > metrics
> > > > > > > >> > > > and
> > > > > > > >> > > > >> a new
> > > > > > > >> > > > >> >>> configuration to control the fan-out rate.
> > > > > > > >> > > > >> >>>
> > > > > > > >> > > > >> >>> Thank you all for the review and have a nice
> > > weekend!
> > > > > > > >> > > > >> >>>
> > > > > > > >> > > > >> >>> Best,
> > > > > > > >> > > > >> >>> Qichao
> > > > > > > >> > > > >>
> > > > > > > >> > > > >>
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

Reply via email to