Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Chris Egerton Mon, 05 Dec 2022 06:11:24 -0800

Hi Jorge,

Thanks for filing KAFKA-14441! In the ticket description we mention that
"there will be more confidence whether to design metrics to be exposed at a
DEBUG or INFO level depending on their impact" but it doesn't seem like
this is called out in the KIP and, just based on what's in the KIP, the
proposal is still to have several per-record metrics exposed at INFO level.


Could we explicitly call out that the per-record metrics will definitely be
added to Kafka Connect as part of this KIP, but they will only be exposed
at INFO level pending pending the performance testing described in
KAFKA-14441, and will otherwise only be exposed at DEBUG level? Otherwise,
it's possible that a vote for the KIP as it's written today would be a vote
in favor of unconditionally exposing these metrics at INFO level, even if
the performance testing reveals issues.

Cheers,

Chris

On Sun, Dec 4, 2022 at 7:08 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks for the reminder Chris!
>
> I have added a note on the KIP to include this as part of the KIP as most
> of the metrics proposed are per-record and having all on DEBUG would limit
> the benefits, and created
> https://issues.apache.org/jira/browse/KAFKA-14441
> to keep track of this task.
>
> Cheers,
> Jorge.
>
> On Tue, 29 Nov 2022 at 19:40, Chris Egerton <chr...@aiven.io.invalid>
> wrote:
>
> > Hi Jorge,
> >
> > Thanks! What were your thoughts on the possible benchmarking and/or
> > downgrading of per-record metrics to DEBUG?
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Nov 24, 2022 at 8:20 AM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Thanks Chris! I have updated the KIP with "transform" instead of
> "alias".
> > > Agree it's clearer.
> > >
> > > Cheers,
> > > Jorge.
> > >
> > > On Mon, 21 Nov 2022 at 21:36, Chris Egerton <chr...@aiven.io.invalid>
> > > wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > Thanks for the updates, and apologies for the delay. The new diagram
> > > > directly under the "Proposed Changes" section is absolutely gorgeous!
> > > >
> > > >
> > > > Follow-ups:
> > > >
> > > > RE 2: Good point. We can use the same level for these metrics, it's
> > not a
> > > > big deal.
> > > >
> > > > RE 3: As long as all the per-record metrics are kept at DEBUG level,
> it
> > > > should be fine to leave JMH benchmarking for a follow-up. If we want
> to
> > > add
> > > > new per-record, INFO-level metrics, I would be more comfortable with
> > > > including benchmarking as part of the testing plan for the KIP. One
> > > > possible compromise could be to propose that these features be merged
> > at
> > > > DEBUG level, and then possibly upgraded to INFO level in the future
> > > pending
> > > > benchmarks to guard against performance degradation.
> > > >
> > > > RE 4: I think for a true "end-to-end" metric, it'd be useful to
> include
> > > the
> > > > time taken by the task to actually deliver the record. However, with
> > the
> > > > new metric names and descriptions provided in the KIP, I have no
> > > objections
> > > > with what's currently proposed, and a new "end-to-end" metric can be
> > > taken
> > > > on later in a follow-up KIP.
> > > >
> > > > RE 6: You're right, existing producer metrics should be enough for
> now.
> > > We
> > > > can revisit this later if/when we add delivery-centric metrics for
> sink
> > > > tasks as well.
> > > >
> > > > RE 7: The new metric names in the KIP LGTM; I don't see any need to
> > > expand
> > > > beyond those but if you'd still like to pursue others, LMK.
> > > >
> > > >
> > > > New thoughts:
> > > >
> > > > One small thought: instead of "alias" in "alias="{transform_alias}"
> for
> > > the
> > > > per-transform metrics, could we use "transform"? IMO it's clearer
> since
> > > we
> > > > don't use "alias" in the names of transform-related properties, and
> > > "alias"
> > > > may be confused with the classloading term where you can use, e.g.,
> > > > "FileStreamSource" as the name of a connector class in a connector
> > config
> > > > instead of "org.apache.kafka.connect.file.FileStreamSourceConnector".
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Fri, Nov 18, 2022 at 12:06 PM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jo...@gmail.com> wrote:
> > > >
> > > > > Thanks Mickael!
> > > > >
> > > > >
> > > > > On Wed, 9 Nov 2022 at 15:54, Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jorge,
> > > > > >
> > > > > > Thanks for the KIP, it is a nice improvement.
> > > > > >
> > > > > > 1) The per transformation metrics still have a question mark next
> > to
> > > > > > them in the KIP. Do you want to include them? If so we'll want to
> > tag
> > > > > > them, we should be able to include the aliases in
> > TransformationChain
> > > > > > and use them.
> > > > > >
> > > > >
> > > > > Yes, I have added the changes on TransformChain that will be needed
> > to
> > > > add
> > > > > these metrics.
> > > > >
> > > > >
> > > > > >
> > > > > > 2) I see no references to predicates. If we don't want to measure
> > > > > > their latency, can we say it explicitly?
> > > > > >
> > > > >
> > > > > Good question, I haven't considered these. Though as these are
> > > > materialized
> > > > > as PredicatedTransformation, they should be covered by these
> changes.
> > > > > Adding a note about this.
> > > > >
> > > > >
> > > > > >
> > > > > > 3) Should we have sink-record-batch-latency-avg-ms? All other
> > metrics
> > > > > > have both the maximum and average values.
> > > > > >
> > > > > >
> > > > > Good question. I will remove it and change the record latency from
> > > > > DEBUG->INFO as it already cover the maximum metric.
> > > > >
> > > > > Hope it's clearer now, let me know if there any additional
> feedback.
> > > > > Thanks!
> > > > >
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Mickael
> > > > > >
> > > > > > On Thu, Oct 20, 2022 at 9:58 PM Jorge Esteban Quilcate Otoya
> > > > > > <quilcate.jo...@gmail.com> wrote:
> > > > > > >
> > > > > > > Thanks, Chris! Great feedback! Please, find my comments below:
> > > > > > >
> > > > > > > On Thu, 13 Oct 2022 at 18:52, Chris Egerton
> > > <chr...@aiven.io.invalid
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jorge,
> > > > > > > >
> > > > > > > > Thanks for the KIP. I agree with the overall direction and
> > think
> > > > this
> > > > > > would
> > > > > > > > be a nice improvement to Kafka Connect. Here are my initial
> > > > thoughts
> > > > > > on the
> > > > > > > > details:
> > > > > > > >
> > > > > > > > 1. The motivation section outlines the gaps in Kafka
> Connect's
> > > task
> > > > > > metrics
> > > > > > > > nicely. I think it'd be useful to include more concrete
> details
> > > on
> > > > > why
> > > > > > > > these gaps need to be filled in, and in which cases
> additional
> > > > > metrics
> > > > > > > > would be helpful. One goal could be to provide enhanced
> > > monitoring
> > > > of
> > > > > > > > production deployments that allows for cluster administrators
> > to
> > > > set
> > > > > up
> > > > > > > > automatic alerts for latency spikes and, if triggered,
> quickly
> > > > > > identify the
> > > > > > > > root cause of those alerts, reducing the time to remediation.
> > > > Another
> > > > > > goal
> > > > > > > > could be to provide more insight to developers or cluster
> > > > > > administrators
> > > > > > > > who want to do performance testing on connectors in
> > > non-production
> > > > > > > > environments. It may help guide our decision making process
> to
> > > > have a
> > > > > > > > clearer picture of the goals we're trying to achieve.
> > > > > > > >
> > > > > > >
> > > > > > > Agree. The Motivation section has been updated.
> > > > > > > Thanks for the examples, I see both of them being covered by
> the
> > > KIP.
> > > > > > > I see how these could give us a good distinction on whether to
> > > > position
> > > > > > > some metrics at INFO or DEBUG level.
> > > > > > >
> > > > > > >
> > > > > > > > 2. If we're trying to address the alert-and-diagnose use
> case,
> > > it'd
> > > > > be
> > > > > > > > useful to have as much information as possible at INFO level,
> > > > rather
> > > > > > than
> > > > > > > > forcing cluster administrators to possibly reconfigure a
> > > connector
> > > > to
> > > > > > emit
> > > > > > > > DEBUG or TRACE level metrics in order to diagnose a potential
> > > > > > > > production-impacting performance bottleneck. I can see the
> > > > rationale
> > > > > > for
> > > > > > > > emitting per-record metrics that track an average value at
> > DEBUG
> > > > > > level, but
> > > > > > > > for per-record metrics that track a maximum value, is there
> any
> > > > > reason
> > > > > > not
> > > > > > > > to provide this information at INFO level?
> > > > > > > >
> > > > > > >
> > > > > > > Agree. Though with Max and Avg metrics being part of the same
> > > sensor
> > > > —
> > > > > > > where Metric Level is defined — then both metrics get the same
> > > level.
> > > > > > >
> > > > > > >
> > > > > > > > 3. I'm also curious about the performance testing suggested
> by
> > > Yash
> > > > > to
> > > > > > > > gauge the potential impact of this change. Have you been able
> > to
> > > do
> > > > > any
> > > > > > > > testing with your draft implementation yet?
> > > > > > > >
> > > > > > >
> > > > > > > No, not so far.
> > > > > > > I think it would be valuable to discuss the scope of this
> testing
> > > and
> > > > > > maybe
> > > > > > > tackle it
> > > > > > > in a separate issue as Sensors and Metrics are used all over
> the
> > > > place.
> > > > > > > My initial understanding is that these tests should by placed
> in
> > > the
> > > > > > > jmh-benchmarks[1].
> > > > > > > Then, we could target testing Sensors and Metrics, and validate
> > how
> > > > > much
> > > > > > > overhead
> > > > > > > is added by having only Max vs Max,Avg(,Min), etc.
> > > > > > > In the other hand, we could extend this to Transformers or
> other
> > > > > Connect
> > > > > > > layers.
> > > > > > >
> > > > > > > Here are some pointers to the Sensors and Metrics
> implementations
> > > > that
> > > > > > > could be considered:
> > > > > > > Path to metric recording:
> > > > > > > -
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L195-L199
> > > > > > > -
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L230-L244
> > > > > > >
> > > > > > > ```
> > > > > > > // increment all the stats
> > > > > > > for (StatAndConfig statAndConfig : this.stats) {
> > > > > > >    statAndConfig.stat.record(statAndConfig.config(), value,
> > > timeMs);
> > > > > > > }
> > > > > > > ```
> > > > > > >
> > > > > > > SampledStats:
> > > > > > > - Avg:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Avg.java
> > > > > > > - Max:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Max.java
> > > > > > > - Min:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Min.java
> > > > > > >
> > > > > > > `stat#record()` are implemented by `update` method in
> > SampledStat:
> > > > > > >
> > > > > > > ```Max.java
> > > > > > >     @Override
> > > > > > >     protected void update(Sample sample, MetricConfig config,
> > > double
> > > > > > value,
> > > > > > > long now) {
> > > > > > >         sample.value += value;
> > > > > > >     }
> > > > > > > ```
> > > > > > >
> > > > > > > ```Avg.java
> > > > > > >     @Override
> > > > > > >     protected void update(Sample sample, MetricConfig config,
> > > double
> > > > > > value,
> > > > > > > long now) {
> > > > > > >         sample.value = Math.max(sample.value, value);
> > > > > > >     }
> > > > > > > ```
> > > > > > >
> > > > > > > As far as I understand, most of the work of the stats happens
> on
> > > the
> > > > > > > `combine` method that is not part of the connector execution
> but
> > > > called
> > > > > > > when metrics are queried.
> > > > > > >
> > > > > > > I wonder whether we should consider Avg and Max for all metrics
> > > > > proposed
> > > > > > as
> > > > > > > the impact on the execution path seems minimal, and even see if
> > Min
> > > > is
> > > > > > also
> > > > > > > valuable, and use DEBUG only for more granular metrics.
> > > > > > >
> > > > > > > [1] https://github.com/apache/kafka/tree/trunk/jmh-benchmarks
> > > > > > >
> > > > > > >
> > > > > > > > 4. Just to make sure I understand correctly--does "time when
> it
> > > has
> > > > > > been
> > > > > > > > received by the Sink task" refer to the wallclock time
> directly
> > > > > after a
> > > > > > > > call to SinkTask::put has been completed (as opposed to
> > directly
> > > > > before
> > > > > > > > that call is made, or something else entirely)?
> > > > > > > >
> > > > > > >
> > > > > > > It currently means when it has been received by the Sink task
> > > > > > > right after consumer poll and before conversions.
> > > > > > > Would it be valuable to have it after put-sink-records?
> > > > > > >
> > > > > > >
> > > > > > > > 5. If the goal is to identify performance bottlenecks (either
> > in
> > > > > > production
> > > > > > > > or pre-production environments), would it make sense to
> > introduce
> > > > > > metrics
> > > > > > > > for each individual converter (i.e., key/value/header) and
> > > > > > transformation?
> > > > > > > > It's definitely an improvement to be able to identify the
> total
> > > > time
> > > > > > for
> > > > > > > > conversion and transformation, but then the immediate
> follow-up
> > > > > > question if
> > > > > > > > a bottleneck is found in that phase is "which
> > > > > converter/transformation
> > > > > > is
> > > > > > > > responsible?" It'd be nice if we could provide a way to
> quickly
> > > > > answer
> > > > > > that
> > > > > > > > question.
> > > > > > > >
> > > > > > >
> > > > > > > This is a great idea. I'd like to consider this as well, though
> > > maybe
> > > > > > these
> > > > > > > more granular
> > > > > > > metrics would be good to have them as DEBUG.
> > > > > > >
> > > > > > >
> > > > > > > > 6. Any thoughts about offering latency metrics for source
> tasks
> > > > > between
> > > > > > > > receipt of the record from the task and delivery of the
> record
> > to
> > > > > Kafka
> > > > > > > > (which would be tracked by producer callback)? We could also
> > use
> > > > the
> > > > > > record
> > > > > > > > timestamp either instead of or in addition to receipt time if
> > the
> > > > > task
> > > > > > > > provides a timestamp with its records.
> > > > > > > >
> > > > > > >
> > > > > > > With source transform and convert metrics we get part of that
> > > > latency.
> > > > > > > Looking at the Producer metrics, `request-latency` (though a
> very
> > > > > generic
> > > > > > > metric)
> > > > > > > sort of answer the time between send request and ack — if my
> > > > > > understanding
> > > > > > > is correct.
> > > > > > > Would these be enough or you're thinking about another
> approach?
> > > > > > > maybe a custom metric to cover the producer side?
> > > > > > >
> > > > > > >
> > > > > > > > 7. We may end up introducing a way for sink tasks to record
> > > > > per-record
> > > > > > > > delivery to the sink system (see KIP-767 [1]). I'd like it if
> > we
> > > > > could
> > > > > > keep
> > > > > > > > the names of our metrics very precise in order to avoid
> > confusing
> > > > > users
> > > > > > > > (who may think that we're providing metrics on actual
> delivery
> > to
> > > > the
> > > > > > sink
> > > > > > > > system, which may not be the case if the connector performs
> > > > > > asynchronous
> > > > > > > > writes), and in order to leave room for a metrics on true
> > > delivery
> > > > > > time by
> > > > > > > > sink tasks. It'd also be nice if we could remain consistent
> > with
> > > > > > existing
> > > > > > > > metrics such as "put-batch-avg-time-ms". With that in mind,
> > what
> > > do
> > > > > you
> > > > > > > > think about renaming these metrics:
> > > > > > > > - "sink-record-batch-latency-max-ms" to
> > > "put-batch-avg-latency-ms"
> > > > > > > > - "sink-record-latency-max-ms" to
> > > "put-sink-record-latency-max-ms"
> > > > > > > > - "sink-record-latency-avg-ms" to
> > > "put-sink-record-latency-avg-ms"
> > > > > > > > - "sink-record-convert-transform-time-max-ms" to
> > > > > > > > "convert-transform-sink-record-time-max-ms"
> > > > > > > > - "sink-record-convert-transform-time-avg-ms" to
> > > > > > > > "convert-transform-sink-record-time-avg-ms"
> > > > > > > > - "source-record-transform-convert-time-max-ms" to
> > > > > > > > "transform-convert-source-record-time-max-ms"
> > > > > > > > - "source-record-transform-convert-time-avg-ms" to
> > > > > > > > "transform-convert-source-record-time-avg-ms"
> > > > > > > >
> > > > > > >
> > > > > > > Make sense, thanks! I have updated the list of metrics and
> group
> > > them
> > > > > by
> > > > > > > sensor and applying these suggestions.
> > > > > > > The only ones that I want to review are: sink-record-* to
> > > put-batch-*
> > > > > > > (first 3). Not sure if put-batch/put-sink-record describes the
> > > > purpose
> > > > > of
> > > > > > > the metric — neither `sink-record-latency` to be honest.
> > > > > > > My initial thought was to have something like Kafka Streams
> > > > > e2e-latency.
> > > > > > > Based on 4. and 6. questions, an idea could be to add:
> > > > > > > - source-batch-e2e-latency-before-send: measure wallclock -
> > source
> > > > > record
> > > > > > > timestamp after source connector poll.
> > > > > > > - source-batch-e2e-latency-after-send: measure wallclock -
> record
> > > > > > timestamp
> > > > > > > on producer send callback
> > > > > > > - sink-batch-e2e-latency-before-put: measure time wallclock -
> > > record
> > > > > > > timestamp after consumer poll
> > > > > > > - sink-batch-e2e-latency-after-put: measure time wallclock -
> > record
> > > > > > > timestamp after sink connector put.
> > > > > > >
> > > > > > >
> > > > > > > > Thanks again for the KIP! Looking forward to your thoughts.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Chris
> > > > > > > >
> > > > > > > > [1] -
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
> > > > > > > >
> > > > > > > > On Thu, Sep 15, 2022 at 1:32 PM Jorge Esteban Quilcate Otoya
> <
> > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I've made a slight addition to the KIP based on Yash
> > feedback:
> > > > > > > > >
> > > > > > > > > - A new metric is added at INFO level to record the max
> > latency
> > > > > from
> > > > > > the
> > > > > > > > > batch timestamp, by keeping the oldest record timestamp per
> > > > batch.
> > > > > > > > > - A draft implementation is linked.
> > > > > > > > >
> > > > > > > > > Looking forward to your feedback.
> > > > > > > > > Also, a kindly reminder that the vote thread is open.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Jorge.
> > > > > > > > >
> > > > > > > > > On Thu, 8 Sept 2022 at 14:25, Jorge Esteban Quilcate Otoya
> <
> > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > Great. I have updated the KIP to reflect this.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Jorge.
> > > > > > > > > >
> > > > > > > > > > On Thu, 8 Sept 2022 at 12:26, Yash Mayya <
> > > yash.ma...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Thanks, I think it makes sense to define these metrics
> at
> > a
> > > > > DEBUG
> > > > > > > > > >> recording
> > > > > > > > > >> level.
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban Quilcate
> > Otoya
> > > <
> > > > > > > > > >> quilcate.jo...@gmail.com> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > On Thu, 8 Sept 2022 at 05:55, Yash Mayya <
> > > > > yash.ma...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hi Jorge,
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks for the changes. With regard to having per
> > batch
> > > vs
> > > > > per
> > > > > > > > > record
> > > > > > > > > >> > > metrics, the additional overhead I was referring to
> > > wasn't
> > > > > > about
> > > > > > > > > >> whether
> > > > > > > > > >> > or
> > > > > > > > > >> > > not we would need to iterate over all the records
> in a
> > > > > batch.
> > > > > > I
> > > > > > > > was
> > > > > > > > > >> > > referring to the potential additional overhead
> caused
> > by
> > > > the
> > > > > > > > higher
> > > > > > > > > >> > volume
> > > > > > > > > >> > > of calls to Sensor::record on the sensors for the
> new
> > > > > metrics
> > > > > > (as
> > > > > > > > > >> > compared
> > > > > > > > > >> > > to the existing batch only metrics), especially for
> > high
> > > > > > > > throughput
> > > > > > > > > >> > > connectors where batch sizes could be large. I guess
> > we
> > > > may
> > > > > > want
> > > > > > > > to
> > > > > > > > > do
> > > > > > > > > >> > some
> > > > > > > > > >> > > sort of performance testing and get concrete numbers
> > to
> > > > > verify
> > > > > > > > > whether
> > > > > > > > > >> > this
> > > > > > > > > >> > > is a valid concern or not?
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> > 6.1. Got it, thanks for clarifying. I guess there
> could
> > > be a
> > > > > > > > benchmark
> > > > > > > > > >> test
> > > > > > > > > >> > of the `Sensor::record` to get an idea of the
> > performance
> > > > > > impact.
> > > > > > > > > >> > Regardless, the fact that these are single-record
> > metrics
> > > > > > compared
> > > > > > > > to
> > > > > > > > > >> > existing batch-only could be explicitly defined by
> > setting
> > > > > these
> > > > > > > > > >> metrics at
> > > > > > > > > >> > a DEBUG or TRACE metric recording level, leaving the
> > > > existing
> > > > > at
> > > > > > > > INFO
> > > > > > > > > >> > level.
> > > > > > > > > >> > wdyt?
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks,
> > > > > > > > > >> > > Yash
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban
> Quilcate
> > > > Otoya
> > > > > <
> > > > > > > > > >> > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi Sagar and Yash,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > the way it's defined in
> > > > > > > > > >> > > >
> > > > > https://kafka.apache.org/documentation/#connect_monitoring
> > > > > > for
> > > > > > > > > the
> > > > > > > > > >> > > metrics
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 4.1. Got it. Add it to the KIP.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > The only thing I would argue is do we need
> > > > > > > > > >> sink-record-latency-min?
> > > > > > > > > >> > > Maybe
> > > > > > > > > >> > > > we
> > > > > > > > > >> > > > > could remove this min metric as well and make
> all
> > of
> > > > the
> > > > > > 3 e2e
> > > > > > > > > >> > metrics
> > > > > > > > > >> > > > > consistent
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 4.2 I see. Will remove it from the KIP.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Probably users can track the metrics at their
> end
> > to
> > > > > > > > > >> > > > > figure that out. Do you think that makes sense?
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 4.3. Yes, agree. With these new metrics it should
> be
> > > > > easier
> > > > > > for
> > > > > > > > > >> users
> > > > > > > > > >> > to
> > > > > > > > > >> > > > track this.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > I think it makes sense to not have a min metric
> > for
> > > > > > either to
> > > > > > > > > >> remain
> > > > > > > > > >> > > > > consistent with the existing put-batch and
> > > poll-batch
> > > > > > metrics
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 5.1. Got it. Same as 4.2
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Another naming related suggestion I had was with
> > the
> > > > > > > > > >> > > > > "convert-time" metrics - we should probably
> > include
> > > > > > > > > >> transformations
> > > > > > > > > >> > in
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > name since SMTs could definitely be attributable
> > to
> > > a
> > > > > > sizable
> > > > > > > > > >> chunk
> > > > > > > > > >> > of
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > latency depending on the specific transformation
> > > > chain.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 5.2. Make sense. I'm proposing to add
> > > > > > > > > >> > `sink-record-convert-transform...`
> > > > > > > > > >> > > > and `source-record-transform-convert...` to
> > represent
> > > > > > correctly
> > > > > > > > > the
> > > > > > > > > >> > order
> > > > > > > > > >> > > > of operations.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > it seems like both source and sink tasks only
> > record
> > > > > > metrics
> > > > > > > > at
> > > > > > > > > a
> > > > > > > > > >> > > "batch"
> > > > > > > > > >> > > > > level, not on an individual record level. I
> think
> > it
> > > > > > might be
> > > > > > > > > >> > > additional
> > > > > > > > > >> > > > > overhead if we want to record these new metrics
> > all
> > > at
> > > > > the
> > > > > > > > > record
> > > > > > > > > >> > > level?
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 5.3. I considered at the beginning to implement
> all
> > > > > metrics
> > > > > > at
> > > > > > > > the
> > > > > > > > > >> > batch
> > > > > > > > > >> > > > level, but given how the framework process
> records,
> > I
> > > > > > fallback
> > > > > > > > to
> > > > > > > > > >> the
> > > > > > > > > >> > > > proposed approach:
> > > > > > > > > >> > > > - Sink Task:
> > > > > > > > > >> > > >   - `WorkerSinkTask#convertMessages(msgs)` already
> > > > > iterates
> > > > > > over
> > > > > > > > > >> > records,
> > > > > > > > > >> > > > so there is no additional overhead to capture
> record
> > > > > > latency per
> > > > > > > > > >> > record.
> > > > > > > > > >> > > >     -
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L490-L514
> > > > > > > > > >> > > >   -
> > `WorkerSinkTask#convertAndTransformRecord(record)`
> > > > > > actually
> > > > > > > > > >> happens
> > > > > > > > > >> > > > individually. Measuring this operation per batch
> > would
> > > > > > include
> > > > > > > > > >> > processing
> > > > > > > > > >> > > > that is not strictly part of "convert and
> transform"
> > > > > > > > > >> > > >     -
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L518
> > > > > > > > > >> > > > - Source Task:
> > > > > > > > > >> > > >   - `AbstractWorkerSourceTask#sendRecords`
> iterates
> > > > over a
> > > > > > batch
> > > > > > > > > and
> > > > > > > > > >> > > > applies transforms and convert record individually
> > as
> > > > > well:
> > > > > > > > > >> > > >     -
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L389-L390
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > This might require some additional changes -
> > > > > > > > > >> > > > > for instance, with the "sink-record-latency"
> > metric,
> > > > we
> > > > > > might
> > > > > > > > > only
> > > > > > > > > >> > want
> > > > > > > > > >> > > > to
> > > > > > > > > >> > > > > have a "max" metric since "avg" would require
> > > > recording
> > > > > a
> > > > > > > > value
> > > > > > > > > on
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > sensor for each record (whereas we can get a
> "max"
> > > by
> > > > > only
> > > > > > > > > >> recording
> > > > > > > > > >> > a
> > > > > > > > > >> > > > > metric value for the oldest record in each
> batch).
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 5.4. Recording record-latency per batch may not be
> > as
> > > > > > useful as
> > > > > > > > > >> there
> > > > > > > > > >> > is
> > > > > > > > > >> > > no
> > > > > > > > > >> > > > guarantee that the oldest record will be
> > > representative
> > > > of
> > > > > > the
> > > > > > > > > >> batch.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Sat, 3 Sept 2022 at 16:02, Yash Mayya <
> > > > > > yash.ma...@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Hi Jorge and Sagar,
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > I think it makes sense to not have a min metric
> > for
> > > > > > either to
> > > > > > > > > >> remain
> > > > > > > > > >> > > > > consistent with the existing put-batch and
> > > poll-batch
> > > > > > metrics
> > > > > > > > > (it
> > > > > > > > > >> > > doesn't
> > > > > > > > > >> > > > > seem particularly useful either anyway). Also,
> the
> > > new
> > > > > > > > > >> > > > > "sink-record-latency" metric name looks fine to
> > me,
> > > > > > thanks for
> > > > > > > > > >> making
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > changes! Another naming related suggestion I had
> > was
> > > > > with
> > > > > > the
> > > > > > > > > >> > > > > "convert-time" metrics - we should probably
> > include
> > > > > > > > > >> transformations
> > > > > > > > > >> > in
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > name since SMTs could definitely be attributable
> > to
> > > a
> > > > > > sizable
> > > > > > > > > >> chunk
> > > > > > > > > >> > of
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > latency depending on the specific transformation
> > > > chain.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > I have one high level question with respect to
> > > > > > implementation
> > > > > > > > -
> > > > > > > > > >> > > > currently,
> > > > > > > > > >> > > > > it seems like both source and sink tasks only
> > record
> > > > > > metrics
> > > > > > > > at
> > > > > > > > > a
> > > > > > > > > >> > > "batch"
> > > > > > > > > >> > > > > level, not on an individual record level. I
> think
> > it
> > > > > > might be
> > > > > > > > > >> > > additional
> > > > > > > > > >> > > > > overhead if we want to record these new metrics
> > all
> > > at
> > > > > the
> > > > > > > > > record
> > > > > > > > > >> > > level?
> > > > > > > > > >> > > > > Could we instead make all of these new metrics
> for
> > > > > > batches of
> > > > > > > > > >> records
> > > > > > > > > >> > > > > rather than individual records in order to
> remain
> > > > > > consistent
> > > > > > > > > with
> > > > > > > > > >> the
> > > > > > > > > >> > > > > existing task level metrics? This might require
> > some
> > > > > > > > additional
> > > > > > > > > >> > > changes -
> > > > > > > > > >> > > > > for instance, with the "sink-record-latency"
> > metric,
> > > > we
> > > > > > might
> > > > > > > > > only
> > > > > > > > > >> > want
> > > > > > > > > >> > > > to
> > > > > > > > > >> > > > > have a "max" metric since "avg" would require
> > > > recording
> > > > > a
> > > > > > > > value
> > > > > > > > > on
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > sensor for each record (whereas we can get a
> "max"
> > > by
> > > > > only
> > > > > > > > > >> recording
> > > > > > > > > >> > a
> > > > > > > > > >> > > > > metric value for the oldest record in each
> batch).
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Thanks,
> > > > > > > > > >> > > > > Yash
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Fri, Sep 2, 2022 at 3:16 PM Sagar <
> > > > > > > > sagarmeansoc...@gmail.com
> > > > > > > > > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > Hi Jorge,
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Thanks for the changes.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Regarding the metrics, I meant something like
> > > this:
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > the way it's defined in
> > > > > > > > > >> > > > > >
> > > > > > https://kafka.apache.org/documentation/#connect_monitoring
> > > > > > > > > for
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > metrics.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > I see what you mean by the 3 metrics and how
> it
> > > can
> > > > be
> > > > > > > > > >> interpreted.
> > > > > > > > > >> > > The
> > > > > > > > > >> > > > > > only thing I would argue is do we need
> > > > > > > > > sink-record-latency-min?
> > > > > > > > > >> > Maybe
> > > > > > > > > >> > > > we
> > > > > > > > > >> > > > > > could remove this min metric as well and make
> > all
> > > of
> > > > > > the 3
> > > > > > > > e2e
> > > > > > > > > >> > > metrics
> > > > > > > > > >> > > > > > consistent(since put-batch also doesn't
> expose a
> > > min
> > > > > > which
> > > > > > > > > makes
> > > > > > > > > >> > > sense
> > > > > > > > > >> > > > to
> > > > > > > > > >> > > > > > me). I think this is in contrast to what Yash
> > > > pointed
> > > > > > out
> > > > > > > > > above
> > > > > > > > > >> so
> > > > > > > > > >> > I
> > > > > > > > > >> > > > > would
> > > > > > > > > >> > > > > > like to hear his thoughts as well.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > The other point Yash mentioned about the
> > slightly
> > > > > flawed
> > > > > > > > > >> definition
> > > > > > > > > >> > > of
> > > > > > > > > >> > > > > e2e
> > > > > > > > > >> > > > > > is also true in a sense. But I have a feeling
> > > that's
> > > > > > one the
> > > > > > > > > >> > records
> > > > > > > > > >> > > > are
> > > > > > > > > >> > > > > > polled by the connector tasks, it would be
> > > difficult
> > > > > to
> > > > > > > > track
> > > > > > > > > >> the
> > > > > > > > > >> > > final
> > > > > > > > > >> > > > > leg
> > > > > > > > > >> > > > > > via the framework. Probably users can track
> the
> > > > > metrics
> > > > > > at
> > > > > > > > > their
> > > > > > > > > >> > end
> > > > > > > > > >> > > to
> > > > > > > > > >> > > > > > figure that out. Do you think that makes
> sense?
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Thanks!
> > > > > > > > > >> > > > > > Sagar.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > On Thu, Sep 1, 2022 at 11:40 PM Jorge Esteban
> > > > Quilcate
> > > > > > > > Otoya <
> > > > > > > > > >> > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > > Hi Sagar and Yash,
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > Thanks for your feedback!
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > 1) I am assuming the new metrics would be
> > task
> > > > > level
> > > > > > > > > metric.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > 1.1 Yes, it will be a task level metric,
> > > > implemented
> > > > > > on
> > > > > > > > the
> > > > > > > > > >> > > > > > > Worker[Source/Sink]Task.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > Could you specify the way it's done for
> > other
> > > > > > > > sink/source
> > > > > > > > > >> > > > connector?
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > 1.2. Not sure what do you mean by this.
> Could
> > > you
> > > > > > > > elaborate
> > > > > > > > > a
> > > > > > > > > >> bit
> > > > > > > > > >> > > > more?
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > 2. I am slightly confused about the e2e
> > > latency
> > > > > > > > metric...
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > 2.1. Yes, I see. I was trying to bring a
> > similar
> > > > > > concept
> > > > > > > > as
> > > > > > > > > in
> > > > > > > > > >> > > > Streams
> > > > > > > > > >> > > > > > with
> > > > > > > > > >> > > > > > > KIP-613, though the e2e concept may not be
> > > > > > translatable.
> > > > > > > > > >> > > > > > > We could keep it as `sink-record-latency` to
> > > avoid
> > > > > > > > > conflating
> > > > > > > > > >> > > > > concepts. A
> > > > > > > > > >> > > > > > > similar metric naming was proposed in
> KIP-489
> > > but
> > > > at
> > > > > > the
> > > > > > > > > >> consumer
> > > > > > > > > >> > > > > level —
> > > > > > > > > >> > > > > > > though it seems dormant for a couple of
> years.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > However, the put-batch time measures the
> > > > > > > > > >> > > > > > > > time to put a batch of records to external
> > > sink.
> > > > > > So, I
> > > > > > > > > would
> > > > > > > > > >> > > assume
> > > > > > > > > >> > > > > > the 2
> > > > > > > > > >> > > > > > > > can't be added as is to compute the e2e
> > > latency.
> > > > > > Maybe I
> > > > > > > > > am
> > > > > > > > > >> > > missing
> > > > > > > > > >> > > > > > > > something here. Could you plz clarify
> this.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > 2.2. Yes, agree. Not necessarily added, but
> > with
> > > > > the 3
> > > > > > > > > >> latencies
> > > > > > > > > >> > > > (poll,
> > > > > > > > > >> > > > > > > convert, putBatch) will be clearer where the
> > > > > > bottleneck
> > > > > > > > may
> > > > > > > > > >> be,
> > > > > > > > > >> > and
> > > > > > > > > >> > > > > > > represent the internal processing.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > however, as per the KIP it looks like it
> > will
> > > be
> > > > > > > > > >> > > > > > > > the latency between when the record was
> > > written
> > > > to
> > > > > > Kafka
> > > > > > > > > and
> > > > > > > > > >> > when
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > > > > record is returned by a sink task's
> > consumer's
> > > > > poll?
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > 3.1. Agree. 2.1. could help to clarify this.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > One more thing - I was wondering
> > > > > > > > > >> > > > > > > > if there's a particular reason for having
> a
> > > min
> > > > > > metric
> > > > > > > > for
> > > > > > > > > >> e2e
> > > > > > > > > >> > > > > latency
> > > > > > > > > >> > > > > > > but
> > > > > > > > > >> > > > > > > > not for convert time?
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > 3.2. Was following KIP-613 for e2e which
> seems
> > > > > useful
> > > > > > to
> > > > > > > > > >> compare
> > > > > > > > > >> > > with
> > > > > > > > > >> > > > > > Max a
> > > > > > > > > >> > > > > > > get an idea of the window of results, though
> > > > current
> > > > > > > > > >> latencies in
> > > > > > > > > >> > > > > > Connector
> > > > > > > > > >> > > > > > > do not include Min, and that's why I haven't
> > > added
> > > > > it
> > > > > > for
> > > > > > > > > >> convert
> > > > > > > > > >> > > > > > latency.
> > > > > > > > > >> > > > > > > Do you think it make sense to extend latency
> > > > metrics
> > > > > > with
> > > > > > > > > Min?
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > KIP is updated to clarify some of these
> > changes.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > Many thanks,
> > > > > > > > > >> > > > > > > Jorge.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > On Thu, 1 Sept 2022 at 18:11, Yash Mayya <
> > > > > > > > > >> yash.ma...@gmail.com>
> > > > > > > > > >> > > > wrote:
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > Hi Jorge,
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > Thanks for the KIP! I have the same
> > confusion
> > > > with
> > > > > > the
> > > > > > > > > >> > > e2e-latency
> > > > > > > > > >> > > > > > > metrics
> > > > > > > > > >> > > > > > > > as Sagar above. "e2e" would seem to
> indicate
> > > the
> > > > > > latency
> > > > > > > > > >> > between
> > > > > > > > > >> > > > when
> > > > > > > > > >> > > > > > the
> > > > > > > > > >> > > > > > > > record was written to Kafka and when the
> > > record
> > > > > was
> > > > > > > > > written
> > > > > > > > > >> to
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > sink
> > > > > > > > > >> > > > > > > > system by the connector - however, as per
> > the
> > > > KIP
> > > > > it
> > > > > > > > looks
> > > > > > > > > >> like
> > > > > > > > > >> > > it
> > > > > > > > > >> > > > > will
> > > > > > > > > >> > > > > > > be
> > > > > > > > > >> > > > > > > > the latency between when the record was
> > > written
> > > > to
> > > > > > Kafka
> > > > > > > > > and
> > > > > > > > > >> > when
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > > > > record is returned by a sink task's
> > consumer's
> > > > > > poll? I
> > > > > > > > > think
> > > > > > > > > >> > that
> > > > > > > > > >> > > > > > metric
> > > > > > > > > >> > > > > > > > will be a little confusing to interpret.
> One
> > > > more
> > > > > > thing
> > > > > > > > -
> > > > > > > > > I
> > > > > > > > > >> was
> > > > > > > > > >> > > > > > wondering
> > > > > > > > > >> > > > > > > > if there's a particular reason for having
> a
> > > min
> > > > > > metric
> > > > > > > > for
> > > > > > > > > >> e2e
> > > > > > > > > >> > > > > latency
> > > > > > > > > >> > > > > > > but
> > > > > > > > > >> > > > > > > > not for convert time?
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > Thanks,
> > > > > > > > > >> > > > > > > > Yash
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > On Thu, Sep 1, 2022 at 8:59 PM Sagar <
> > > > > > > > > >> > sagarmeansoc...@gmail.com>
> > > > > > > > > >> > > > > > wrote:
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > > Hi Jorge,
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Thanks for the KIP. It looks like a very
> > > good
> > > > > > > > addition.
> > > > > > > > > I
> > > > > > > > > >> > > skimmed
> > > > > > > > > >> > > > > > > through
> > > > > > > > > >> > > > > > > > > once and had a couple of questions =>
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > 1) I am assuming the new metrics would
> be
> > > task
> > > > > > level
> > > > > > > > > >> metric.
> > > > > > > > > >> > > > Could
> > > > > > > > > >> > > > > > you
> > > > > > > > > >> > > > > > > > > specify the way it's done for other
> > > > sink/source
> > > > > > > > > connector?
> > > > > > > > > >> > > > > > > > > 2) I am slightly confused about the e2e
> > > > latency
> > > > > > > > metric.
> > > > > > > > > >> Let's
> > > > > > > > > >> > > > > > consider
> > > > > > > > > >> > > > > > > > the
> > > > > > > > > >> > > > > > > > > sink connector metric. If I look at the
> > way
> > > > it's
> > > > > > > > > supposed
> > > > > > > > > >> to
> > > > > > > > > >> > be
> > > > > > > > > >> > > > > > > > calculated,
> > > > > > > > > >> > > > > > > > > i.e the difference between the record
> > > > timestamp
> > > > > > and
> > > > > > > > the
> > > > > > > > > >> wall
> > > > > > > > > >> > > > clock
> > > > > > > > > >> > > > > > > time,
> > > > > > > > > >> > > > > > > > it
> > > > > > > > > >> > > > > > > > > looks like a per record metric. However,
> > the
> > > > > > put-batch
> > > > > > > > > >> time
> > > > > > > > > >> > > > > measures
> > > > > > > > > >> > > > > > > the
> > > > > > > > > >> > > > > > > > > time to put a batch of records to
> external
> > > > sink.
> > > > > > So, I
> > > > > > > > > >> would
> > > > > > > > > >> > > > assume
> > > > > > > > > >> > > > > > > the 2
> > > > > > > > > >> > > > > > > > > can't be added as is to compute the e2e
> > > > latency.
> > > > > > > > Maybe I
> > > > > > > > > >> am
> > > > > > > > > >> > > > missing
> > > > > > > > > >> > > > > > > > > something here. Could you plz clarify
> > this.
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Thanks!
> > > > > > > > > >> > > > > > > > > Sagar.
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > On Tue, Aug 30, 2022 at 8:43 PM Jorge
> > > Esteban
> > > > > > Quilcate
> > > > > > > > > >> Otoya
> > > > > > > > > >> > <
> > > > > > > > > >> > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > Hi all,
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > I'd like to start a discussion thread
> on
> > > > > > KIP-864:
> > > > > > > > Add
> > > > > > > > > >> > > > End-To-End
> > > > > > > > > >> > > > > > > > Latency
> > > > > > > > > >> > > > > > > > > > Metrics to Connectors.
> > > > > > > > > >> > > > > > > > > > This KIP aims to improve the metrics
> > > > available
> > > > > > on
> > > > > > > > > Source
> > > > > > > > > >> > and
> > > > > > > > > >> > > > Sink
> > > > > > > > > >> > > > > > > > > > Connectors to measure end-to-end
> > latency,
> > > > > > including
> > > > > > > > > >> source
> > > > > > > > > >> > > and
> > > > > > > > > >> > > > > sink
> > > > > > > > > >> > > > > > > > > record
> > > > > > > > > >> > > > > > > > > > conversion time, and sink record e2e
> > > latency
> > > > > > > > (similar
> > > > > > > > > to
> > > > > > > > > >> > > > KIP-613
> > > > > > > > > >> > > > > > for
> > > > > > > > > >> > > > > > > > > > Streams).
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > The KIP is here:
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > Please take a look and let me know
> what
> > > you
> > > > > > think.
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > Cheers,
> > > > > > > > > >> > > > > > > > > > Jorge.
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Reply via email to