Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

Brenden Deluna Wed, 24 Jul 2024 09:02:05 -0700

Hi Mickael,
Thank you for your feedback.

1. I can see that, I will get that changed.
2. Done


Thanks,
Brenden

On Wed, Jul 24, 2024 at 5:06 AM Mickael Maison <mickael.mai...@gmail.com>
wrote:

> Hi,
>
> 1. The title is a bit misleading. It's proposing to add new metrics,
> JMX is just one of the mechanisms to export them.
> 2. +1 to not register the new metrics when using the classic consumer,
> instead of setting them to 0. Similarly I assume existing metrics that
> don't apply to the new consumers are not registered?
> 3. At the moment this KIP is not changing any public APIs. What's the
> plan to make AsyncKafkaConsumer public?
>
> Thanks,
> Mickael
>
>
>
> On Tue, Jul 23, 2024 at 6:03 PM Bruno Cadonna <cado...@apache.org> wrote:
> >
> > Hi Brenden,
> >
> > BC1. In his first e-mail Andrew wrote "I would expect that the metrics
> > do not exist at all". I agree with him. I think it would be better to
> > not add those metrics at all if the CLASSIC protocol is used rather than
> > the metrics exist and are all constant 0. This should be possible by not
> > adding the metrics to the metrics registry if the CONSUMER protocol is
> > not used.
> >
> > BC2. Is there a specific reason you do not propose
> > background-event-queue-time-max and background-event-queue-time-avg? If
> > folk think those are not useful we do not need to add them. However, if
> > those are not useful, why is background-event-queue-size useful. I was
> > just wondering about the asymmetry between background-event-queue and
> > application-event-queue.
> >
> > Best,
> > Bruno
> >
> >
> >
> > On 7/19/24 9:14 PM, Brenden Deluna wrote:
> > > Hi Apoorv,
> > > Thank you for your comments, I will address each.
> > >
> > > AM1. I can see the usefulness in also having an
> > > 'application-event-queue-age-max' to get an idea of outliers and how
> they
> > > may be affecting the average metric. I will add that.
> > >
> > > AM2. I agree with you there, I think 'time' is a better descriptor here
> > > than 'age'. I will update those metric names as well.
> > >
> > > AM3. Similar to above comments, I will change the name of that metric
> to be
> > > more consistent. And I think a max metric would also be useful here,
> adding
> > > that.
> > >
> > > AM4. Yes, good catch there. Will update that as well.
> > >
> > > Thank you,
> > > Brenden
> > >
> > > On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal <
> apoorvmitta...@gmail.com>
> > > wrote:
> > >
> > >> Hi Brendan,
> > >> Thanks for the KIP. The metrics are always helpful.
> > >>
> > >> AM1: Is `application-event-queue-age-avg` enough or do we require `
> > >> application-event-queue-age-max` as well to differentiate with
> outliers?
> > >>
> > >> AM2: The kafka producer defines metric `record-queue-time-avg` which
> > >> captures the time spent in the buffer. Do you think we should have a
> > >> similar name for `application-event-queue-age-avg` i.e. change to `
> > >> application-event-queue-time-avg`? Moreover other than similar naming,
> > >> `time` anyways seems more suitable than `age`, though minor. The
> `time`
> > >> usage is also aligned with the description of this metric.
> > >>
> > >> AM3: Metric `application-event-processing-time` says "the average
> time,
> > >> that the consumer network.....". Shall we have the `-avg` suffix in
> the
> > >> metric as we have defined for other metrics? Also do we require the
> max
> > >> metric as well for the same?
> > >>
> > >> AM4: Is the telemetry name for `unsent-requests-queue-size` intended
> > >> as `org.apache.kafka.consumer.unsent.requests.size`,
> > >> or it should be corrected to `
> > >> org.apache.kafka.consumer.unsent.requests.queue.size`?
> > >>
> > >> AM2:
> > >> Regards,
> > >> Apoorv Mittal
> > >> +44 7721681581
> > >>
> > >>
> > >> On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield <
> > >> andrew_schofi...@live.com>
> > >> wrote:
> > >>
> > >>> Hi Brenden,
> > >>> Thanks for the updates.
> > >>>
> > >>> AS4. I see that you’ve added `.ms` to a bunch of the metrics
> reflecting
> > >> the
> > >>> fact that they’re measured in milliseconds. However, I observe that
> most
> > >>> metrics
> > >>> in Kafka that are measured in milliseconds, with some exceptions in
> Kafka
> > >>> Connect
> > >>> and MirrorMaker do not follow this convention. I would tend to err
> on the
> > >>> side of
> > >>> consistency with the existing metrics and not use `.ms`. However,
> that’s
> > >>> just my
> > >>> opinion, so I’d be interested to know what other reviewers of the KIP
> > >>> think.
> > >>>
> > >>> Thanks,
> > >>> Andrew
> > >>>
> > >>>> On 12 Jul 2024, at 20:11, Brenden Deluna
> <bdel...@confluent.io.INVALID
> > >>>
> > >>> wrote:
> > >>>>
> > >>>> Hey Lianet,
> > >>>>
> > >>>> Thank you for your suggestions and feedback!
> > >>>>
> > >>>>
> > >>>> LM1. This has now been addressed.
> > >>>>
> > >>>>
> > >>>> LM2. I think that would be a valuable addition to the current set of
> > >>>> metrics, I will get that added.
> > >>>>
> > >>>>
> > >>>> LM3. Again great idea, that would certainly be helpful. Will add
> that
> > >> as
> > >>>> well.
> > >>>>
> > >>>>
> > >>>> Let me know if you have any more suggestions!
> > >>>>
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Brenden
> > >>>>
> > >>>> On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <
> bdel...@confluent.io>
> > >>> wrote:
> > >>>>
> > >>>>> Hi Lucas,
> > >>>>>
> > >>>>> Thank you for the feedback! I have addressed your comments:
> > >>>>>
> > >>>>>
> > >>>>> LB1. Good catch there, I will update the names as needed.
> > >>>>>
> > >>>>>
> > >>>>> LB2. Good catch again! I will update the name to be more
> consistent.
> > >>>>>
> > >>>>>
> > >>>>> LB3. Thank you for pointing this out, I realized that all metric
> > >> values
> > >>>>> will actually be set to 0. I will specifiy this and explain why
> they
> > >>> will
> > >>>>> be 0.
> > >>>>>
> > >>>>>
> > >>>>> Nit: This metric is referring to the queue of unsent requests in
> the
> > >>>>> NetworkClientDelegate. For the metric descriptions I am trying to
> not
> > >>>>> include too much of the implementation details, hence the reason
> that
> > >>>>> description is quite short. I cannot think of other ways to
> describe
> > >> the
> > >>>>> metric without going deeper into the implementation, but please do
> let
> > >>> me
> > >>>>> know if you have any ideas.
> > >>>>>
> > >>>>>
> > >>>>> Thank you,
> > >>>>>
> > >>>>> Brenden
> > >>>>>
> > >>>>> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> Hey Brenden, thanks for the KIP! Great to get more visibility into
> > >> the
> > >>> new
> > >>>>>> consumer.
> > >>>>>>
> > >>>>>> LM1. +1 on Lucas's suggestion for including the unit in the name,
> > >> seems
> > >>>>>> clearer and consistent (I do see several time metrics including
> ms)
> > >>>>>>
> > >>>>>> LM2. What about a new metric for application-event-queue-time-ms.
> It
> > >>> would
> > >>>>>> be a complement to the application-event-queue-size you're
> proposing,
> > >>> and
> > >>>>>> it will tell us how long the events sit in the queue, waiting to
> be
> > >>>>>> processed (from the time the API call adds the event to the
> queue, to
> > >>> the
> > >>>>>> time it's processed in the background thread). I find it would be
> > >> very
> > >>>>>> interesting.
> > >>>>>>
> > >>>>>> LM3. Thinking about the actual usage of
> > >>>>>> "time-between-network-thread-poll-xxx" metric, I imagine it would
> be
> > >>>>>> helpful to know more about what could be impacting it. As I see
> it,
> > >> the
> > >>>>>> network thread cadence could be mainly impacted by: 1- app event
> > >>>>>> processing
> > >>>>>> (generate requests), 2- network client poll (actual send/receive).
> > >> For
> > >>> 2,
> > >>>>>> the new consumer reuses the same component as the legacy one, but
> 1
> > >> is
> > >>>>>> specific to the new consumer, so what about a metric
> > >>>>>> for application-event-processing-time-ms (we could consider avg I
> > >> would
> > >>>>>> say). It would be the time that the network thread takes to
> process
> > >> all
> > >>>>>> available events on each run.
> > >>>>>>
> > >>>>>> Cheers!
> > >>>>>> Lianet
> > >>>>>>
> > >>>>>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy
> > >>>>>> <lbruts...@confluent.io.invalid> wrote:
> > >>>>>>
> > >>>>>>> Hey Brenden,
> > >>>>>>>
> > >>>>>>> thanks for the KIP! These will be great to observe and debug the
> > >>>>>>> background thread of the new consumer.
> > >>>>>>>
> > >>>>>>> LB1. `time-between-network-thread-poll-max` → I see several
> similar
> > >>>>>>> metrics including the unit in the metric name (ms or us). We
> could
> > >>>>>>> consider this, although it's probably not strictly required.
> > >> However,
> > >>>>>>> at least the description should state the unit. Same for the
> `avg`
> > >>>>>>> version.
> > >>>>>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's
> > >> referring
> > >>>>>>> to the size of the request. How about
> `unsent-request-queue-size` or
> > >>>>>>> `unsent-request-count` or simply `unsent-requests`?
> > >>>>>>> LB3. "the proposed metrics below will be set to null or 0." →
> which
> > >>>>>>> one will be set to null and which ones will be set to 0, and why?
> > >>>>>>>
> > >>>>>>> nit: "The current number of unsent requests in the consumer
> > >> network" →
> > >>>>>>> Seems to be missing something?
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Lucas
> > >>>>>>>
> > >>>>>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna
> > >>>>>>> <bdel...@confluent.io.invalid> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi Andrew,
> > >>>>>>>> Thank you for the feedback and your question.
> > >>>>>>>>
> > >>>>>>>> AS1. Great idea, I will get that added.
> > >>>>>>>>
> > >>>>>>>> AS2. For unsent-events-age-max, age will be calculated once the
> > >> event
> > >>>>>> is
> > >>>>>>>> sent, so you are correct.
> > >>>>>>>>
> > >>>>>>>> AS3. I agree, I think that would be a helpful metric to add,
> thank
> > >>>>>> you! I
> > >>>>>>>> will get that added.
> > >>>>>>>>
> > >>>>>>>> Please let me know if you have any additional feedback,
> > >> suggestions,
> > >>>>>> or
> > >>>>>>>> questions.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Brenden
> > >>>>>>>>
> > >>>>>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield <
> > >>>>>>> andrew_schofi...@live.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Brenden,
> > >>>>>>>>> Thanks for the KIP. It fills a gap in the metrics for the new
> > >>>>>> consumer
> > >>>>>>>>> nicely.
> > >>>>>>>>>
> > >>>>>>>>> AS1. If using the CLASSIC protocol, and thus the
> > >>>>>> LegacyKafkaConsumer,
> > >>>>>>>>> I would expect that the metrics do not exist at all. Maybe say
> > >>>>>>> something
> > >>>>>>>>> like
> > >>>>>>>>> “These metrics are for the new consumer implementation using
> the
> > >>>>>>>>> CONSUMER protocol”.
> > >>>>>>>>>
> > >>>>>>>>> AS2. For unsent-events-age-max, when is the age calculated? For
> > >>>>>>> example,
> > >>>>>>>>> is it calculated at the time that the unsent event is removed
> from
> > >>>>>> the
> > >>>>>>>>> list and sent, or does the metric reflect unsent events which
> are
> > >>>>>> still
> > >>>>>>>>> enqueued? I suspect the former, but thought I’d check.
> > >>>>>>>>>
> > >>>>>>>>> AS3. I think that unsent-events-age-avg would also be
> interesting
> > >> to
> > >>>>>>>>> get an idea of how long unsent events tend to sit around before
> > >>>>>>> sending.
> > >>>>>>>>> Of course, the question from AS2 would also apply to the
> average.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Andrew
> > >>>>>>>>>
> > >>>>>>>>>> On 10 Jul 2024, at 17:44, Philip Nee
> <p...@confluent.io.INVALID>
> > >>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi all,
> > >>>>>>>>>>
> > >>>>>>>>>> This is the link to the KIP document.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer
> > >>>>>>>>>>
> > >>>>>>>>>> Any comment is appreciated,
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna
> > >>>>>>>>> <bdel...@confluent.io.invalid>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hello everyone,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I would like to start the discussion thread for KIP-1068.
> This
> > >>>>>> is a
> > >>>>>>>>>>> relatively small KIP, only proposing to add a couple of new
> > >>>>>> metrics.
> > >>>>>>>>>>>
> > >>>>>>>>>>> If you have any suggestions or feedback, let me know, it
> will be
> > >>>>>>> much
> > >>>>>>>>>>> appreciated.
> > >>>
> > >>>
> > >>>
> > >>
> > >
>

Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

Reply via email to