Hi Mickael,
1. I agree that the title is misleading. It should be something like
"New metrics for the AsyncKafkaConsumer". Maybe it should even be
"Metrics for the new consumer".
3. I am not sure I understand this comment. Exposed metrics are public
as far as I understand. So adding new metrics requires a KIP. We had
KIPs in the past that only introduced and/or removed new metrics. For
example,
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1001%3A+Add+CurrentControllerId+Metric
I do not think there are any plans to make the AsyncKafkaConsumer class
public. It is the implementation of the KafkaConsumer when the CONSUMER
protocol is used. But I might be wrong. At least, I am against making it
public.
Best,
Bruno
On 7/24/24 12:05 PM, Mickael Maison wrote:
Hi,
1. The title is a bit misleading. It's proposing to add new metrics,
JMX is just one of the mechanisms to export them.
2. +1 to not register the new metrics when using the classic consumer,
instead of setting them to 0. Similarly I assume existing metrics that
don't apply to the new consumers are not registered?
3. At the moment this KIP is not changing any public APIs. What's the
plan to make AsyncKafkaConsumer public?
Thanks,
Mickael
On Tue, Jul 23, 2024 at 6:03 PM Bruno Cadonna <cado...@apache.org> wrote:
Hi Brenden,
BC1. In his first e-mail Andrew wrote "I would expect that the metrics
do not exist at all". I agree with him. I think it would be better to
not add those metrics at all if the CLASSIC protocol is used rather than
the metrics exist and are all constant 0. This should be possible by not
adding the metrics to the metrics registry if the CONSUMER protocol is
not used.
BC2. Is there a specific reason you do not propose
background-event-queue-time-max and background-event-queue-time-avg? If
folk think those are not useful we do not need to add them. However, if
those are not useful, why is background-event-queue-size useful. I was
just wondering about the asymmetry between background-event-queue and
application-event-queue.
Best,
Bruno
On 7/19/24 9:14 PM, Brenden Deluna wrote:
Hi Apoorv,
Thank you for your comments, I will address each.
AM1. I can see the usefulness in also having an
'application-event-queue-age-max' to get an idea of outliers and how they
may be affecting the average metric. I will add that.
AM2. I agree with you there, I think 'time' is a better descriptor here
than 'age'. I will update those metric names as well.
AM3. Similar to above comments, I will change the name of that metric to be
more consistent. And I think a max metric would also be useful here, adding
that.
AM4. Yes, good catch there. Will update that as well.
Thank you,
Brenden
On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal <apoorvmitta...@gmail.com>
wrote:
Hi Brendan,
Thanks for the KIP. The metrics are always helpful.
AM1: Is `application-event-queue-age-avg` enough or do we require `
application-event-queue-age-max` as well to differentiate with outliers?
AM2: The kafka producer defines metric `record-queue-time-avg` which
captures the time spent in the buffer. Do you think we should have a
similar name for `application-event-queue-age-avg` i.e. change to `
application-event-queue-time-avg`? Moreover other than similar naming,
`time` anyways seems more suitable than `age`, though minor. The `time`
usage is also aligned with the description of this metric.
AM3: Metric `application-event-processing-time` says "the average time,
that the consumer network.....". Shall we have the `-avg` suffix in the
metric as we have defined for other metrics? Also do we require the max
metric as well for the same?
AM4: Is the telemetry name for `unsent-requests-queue-size` intended
as `org.apache.kafka.consumer.unsent.requests.size`,
or it should be corrected to `
org.apache.kafka.consumer.unsent.requests.queue.size`?
AM2:
Regards,
Apoorv Mittal
+44 7721681581
On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield <
andrew_schofi...@live.com>
wrote:
Hi Brenden,
Thanks for the updates.
AS4. I see that you’ve added `.ms` to a bunch of the metrics reflecting
the
fact that they’re measured in milliseconds. However, I observe that most
metrics
in Kafka that are measured in milliseconds, with some exceptions in Kafka
Connect
and MirrorMaker do not follow this convention. I would tend to err on the
side of
consistency with the existing metrics and not use `.ms`. However, that’s
just my
opinion, so I’d be interested to know what other reviewers of the KIP
think.
Thanks,
Andrew
On 12 Jul 2024, at 20:11, Brenden Deluna <bdel...@confluent.io.INVALID
wrote:
Hey Lianet,
Thank you for your suggestions and feedback!
LM1. This has now been addressed.
LM2. I think that would be a valuable addition to the current set of
metrics, I will get that added.
LM3. Again great idea, that would certainly be helpful. Will add that
as
well.
Let me know if you have any more suggestions!
Thanks,
Brenden
On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <bdel...@confluent.io>
wrote:
Hi Lucas,
Thank you for the feedback! I have addressed your comments:
LB1. Good catch there, I will update the names as needed.
LB2. Good catch again! I will update the name to be more consistent.
LB3. Thank you for pointing this out, I realized that all metric
values
will actually be set to 0. I will specifiy this and explain why they
will
be 0.
Nit: This metric is referring to the queue of unsent requests in the
NetworkClientDelegate. For the metric descriptions I am trying to not
include too much of the implementation details, hence the reason that
description is quite short. I cannot think of other ways to describe
the
metric without going deeper into the implementation, but please do let
me
know if you have any ideas.
Thank you,
Brenden
On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com> wrote:
Hey Brenden, thanks for the KIP! Great to get more visibility into
the
new
consumer.
LM1. +1 on Lucas's suggestion for including the unit in the name,
seems
clearer and consistent (I do see several time metrics including ms)
LM2. What about a new metric for application-event-queue-time-ms. It
would
be a complement to the application-event-queue-size you're proposing,
and
it will tell us how long the events sit in the queue, waiting to be
processed (from the time the API call adds the event to the queue, to
the
time it's processed in the background thread). I find it would be
very
interesting.
LM3. Thinking about the actual usage of
"time-between-network-thread-poll-xxx" metric, I imagine it would be
helpful to know more about what could be impacting it. As I see it,
the
network thread cadence could be mainly impacted by: 1- app event
processing
(generate requests), 2- network client poll (actual send/receive).
For
2,
the new consumer reuses the same component as the legacy one, but 1
is
specific to the new consumer, so what about a metric
for application-event-processing-time-ms (we could consider avg I
would
say). It would be the time that the network thread takes to process
all
available events on each run.
Cheers!
Lianet
On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy
<lbruts...@confluent.io.invalid> wrote:
Hey Brenden,
thanks for the KIP! These will be great to observe and debug the
background thread of the new consumer.
LB1. `time-between-network-thread-poll-max` → I see several similar
metrics including the unit in the metric name (ms or us). We could
consider this, although it's probably not strictly required.
However,
at least the description should state the unit. Same for the `avg`
version.
LB2. `unsent-requests-size` → Naming sounds a bit like it's
referring
to the size of the request. How about `unsent-request-queue-size` or
`unsent-request-count` or simply `unsent-requests`?
LB3. "the proposed metrics below will be set to null or 0." → which
one will be set to null and which ones will be set to 0, and why?
nit: "The current number of unsent requests in the consumer
network" →
Seems to be missing something?
Cheers,
Lucas
On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna
<bdel...@confluent.io.invalid> wrote:
Hi Andrew,
Thank you for the feedback and your question.
AS1. Great idea, I will get that added.
AS2. For unsent-events-age-max, age will be calculated once the
event
is
sent, so you are correct.
AS3. I agree, I think that would be a helpful metric to add, thank
you! I
will get that added.
Please let me know if you have any additional feedback,
suggestions,
or
questions.
Thanks,
Brenden
On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield <
andrew_schofi...@live.com>
wrote:
Hi Brenden,
Thanks for the KIP. It fills a gap in the metrics for the new
consumer
nicely.
AS1. If using the CLASSIC protocol, and thus the
LegacyKafkaConsumer,
I would expect that the metrics do not exist at all. Maybe say
something
like
“These metrics are for the new consumer implementation using the
CONSUMER protocol”.
AS2. For unsent-events-age-max, when is the age calculated? For
example,
is it calculated at the time that the unsent event is removed from
the
list and sent, or does the metric reflect unsent events which are
still
enqueued? I suspect the former, but thought I’d check.
AS3. I think that unsent-events-age-avg would also be interesting
to
get an idea of how long unsent events tend to sit around before
sending.
Of course, the question from AS2 would also apply to the average.
Thanks,
Andrew
On 10 Jul 2024, at 17:44, Philip Nee <p...@confluent.io.INVALID>
wrote:
Hi all,
This is the link to the KIP document.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer
Any comment is appreciated,
On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna
<bdel...@confluent.io.invalid>
wrote:
Hello everyone,
I would like to start the discussion thread for KIP-1068. This
is a
relatively small KIP, only proposing to add a couple of new
metrics.
If you have any suggestions or feedback, let me know, it will be
much
appreciated.