Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Jorge Esteban Quilcate Otoya Thu, 20 Oct 2022 12:58:16 -0700

Thanks, Chris! Great feedback! Please, find my comments below:

On Thu, 13 Oct 2022 at 18:52, Chris Egerton <chr...@aiven.io.invalid> wrote:


> Hi Jorge,
>
> Thanks for the KIP. I agree with the overall direction and think this would
> be a nice improvement to Kafka Connect. Here are my initial thoughts on the
> details:
>
> 1. The motivation section outlines the gaps in Kafka Connect's task metrics
> nicely. I think it'd be useful to include more concrete details on why
> these gaps need to be filled in, and in which cases additional metrics
> would be helpful. One goal could be to provide enhanced monitoring of
> production deployments that allows for cluster administrators to set up
> automatic alerts for latency spikes and, if triggered, quickly identify the
> root cause of those alerts, reducing the time to remediation. Another goal
> could be to provide more insight to developers or cluster administrators
> who want to do performance testing on connectors in non-production
> environments. It may help guide our decision making process to have a
> clearer picture of the goals we're trying to achieve.
>

Agree. The Motivation section has been updated.
Thanks for the examples, I see both of them being covered by the KIP.
I see how these could give us a good distinction on whether to position
some metrics at INFO or DEBUG level.


> 2. If we're trying to address the alert-and-diagnose use case, it'd be
> useful to have as much information as possible at INFO level, rather than
> forcing cluster administrators to possibly reconfigure a connector to emit
> DEBUG or TRACE level metrics in order to diagnose a potential
> production-impacting performance bottleneck. I can see the rationale for
> emitting per-record metrics that track an average value at DEBUG level, but
> for per-record metrics that track a maximum value, is there any reason not
> to provide this information at INFO level?
>

Agree. Though with Max and Avg metrics being part of the same sensor —
where Metric Level is defined — then both metrics get the same level.


> 3. I'm also curious about the performance testing suggested by Yash to
> gauge the potential impact of this change. Have you been able to do any
> testing with your draft implementation yet?
>

No, not so far.
I think it would be valuable to discuss the scope of this testing and maybe
tackle it
in a separate issue as Sensors and Metrics are used all over the place.
My initial understanding is that these tests should by placed in the
jmh-benchmarks[1].
Then, we could target testing Sensors and Metrics, and validate how much
overhead
is added by having only Max vs Max,Avg(,Min), etc.
In the other hand, we could extend this to Transformers or other Connect
layers.

Here are some pointers to the Sensors and Metrics implementations that
could be considered:
Path to metric recording:
-
https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L195-L199
-
https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L230-L244

```
// increment all the stats
for (StatAndConfig statAndConfig : this.stats) {
   statAndConfig.stat.record(statAndConfig.config(), value, timeMs);
}
```

SampledStats:
- Avg:
https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Avg.java
- Max:
https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Max.java
- Min:
https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Min.java

`stat#record()` are implemented by `update` method in SampledStat:

```Max.java
    @Override
    protected void update(Sample sample, MetricConfig config, double value,
long now) {
        sample.value += value;
    }
```

```Avg.java
    @Override
    protected void update(Sample sample, MetricConfig config, double value,
long now) {
        sample.value = Math.max(sample.value, value);
    }
```

As far as I understand, most of the work of the stats happens on the
`combine` method that is not part of the connector execution but called
when metrics are queried.

I wonder whether we should consider Avg and Max for all metrics proposed as
the impact on the execution path seems minimal, and even see if Min is also
valuable, and use DEBUG only for more granular metrics.

[1] https://github.com/apache/kafka/tree/trunk/jmh-benchmarks


> 4. Just to make sure I understand correctly--does "time when it has been
> received by the Sink task" refer to the wallclock time directly after a
> call to SinkTask::put has been completed (as opposed to directly before
> that call is made, or something else entirely)?
>

It currently means when it has been received by the Sink task
right after consumer poll and before conversions.
Would it be valuable to have it after put-sink-records?


> 5. If the goal is to identify performance bottlenecks (either in production
> or pre-production environments), would it make sense to introduce metrics
> for each individual converter (i.e., key/value/header) and transformation?
> It's definitely an improvement to be able to identify the total time for
> conversion and transformation, but then the immediate follow-up question if
> a bottleneck is found in that phase is "which converter/transformation is
> responsible?" It'd be nice if we could provide a way to quickly answer that
> question.
>

This is a great idea. I'd like to consider this as well, though maybe these
more granular
metrics would be good to have them as DEBUG.


> 6. Any thoughts about offering latency metrics for source tasks between
> receipt of the record from the task and delivery of the record to Kafka
> (which would be tracked by producer callback)? We could also use the record
> timestamp either instead of or in addition to receipt time if the task
> provides a timestamp with its records.
>

With source transform and convert metrics we get part of that latency.
Looking at the Producer metrics, `request-latency` (though a very generic
metric)
sort of answer the time between send request and ack — if my understanding
is correct.
Would these be enough or you're thinking about another approach?
maybe a custom metric to cover the producer side?


> 7. We may end up introducing a way for sink tasks to record per-record
> delivery to the sink system (see KIP-767 [1]). I'd like it if we could keep
> the names of our metrics very precise in order to avoid confusing users
> (who may think that we're providing metrics on actual delivery to the sink
> system, which may not be the case if the connector performs asynchronous
> writes), and in order to leave room for a metrics on true delivery time by
> sink tasks. It'd also be nice if we could remain consistent with existing
> metrics such as "put-batch-avg-time-ms". With that in mind, what do you
> think about renaming these metrics:
> - "sink-record-batch-latency-max-ms" to "put-batch-avg-latency-ms"
> - "sink-record-latency-max-ms" to "put-sink-record-latency-max-ms"
> - "sink-record-latency-avg-ms" to "put-sink-record-latency-avg-ms"
> - "sink-record-convert-transform-time-max-ms" to
> "convert-transform-sink-record-time-max-ms"
> - "sink-record-convert-transform-time-avg-ms" to
> "convert-transform-sink-record-time-avg-ms"
> - "source-record-transform-convert-time-max-ms" to
> "transform-convert-source-record-time-max-ms"
> - "source-record-transform-convert-time-avg-ms" to
> "transform-convert-source-record-time-avg-ms"
>

Make sense, thanks! I have updated the list of metrics and group them by
sensor and applying these suggestions.
The only ones that I want to review are: sink-record-* to put-batch-*
(first 3). Not sure if put-batch/put-sink-record describes the purpose of
the metric — neither `sink-record-latency` to be honest.
My initial thought was to have something like Kafka Streams e2e-latency.
Based on 4. and 6. questions, an idea could be to add:
- source-batch-e2e-latency-before-send: measure wallclock - source record
timestamp after source connector poll.
- source-batch-e2e-latency-after-send: measure wallclock - record timestamp
on producer send callback
- sink-batch-e2e-latency-before-put: measure time wallclock - record
timestamp after consumer poll
- sink-batch-e2e-latency-after-put: measure time wallclock - record
timestamp after sink connector put.


> Thanks again for the KIP! Looking forward to your thoughts.
>
> Cheers,
>
> Chris
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
>
> On Thu, Sep 15, 2022 at 1:32 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I've made a slight addition to the KIP based on Yash feedback:
> >
> > - A new metric is added at INFO level to record the max latency from the
> > batch timestamp, by keeping the oldest record timestamp per batch.
> > - A draft implementation is linked.
> >
> > Looking forward to your feedback.
> > Also, a kindly reminder that the vote thread is open.
> >
> > Thanks!
> > Jorge.
> >
> > On Thu, 8 Sept 2022 at 14:25, Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Great. I have updated the KIP to reflect this.
> > >
> > > Cheers,
> > > Jorge.
> > >
> > > On Thu, 8 Sept 2022 at 12:26, Yash Mayya <yash.ma...@gmail.com> wrote:
> > >
> > >> Thanks, I think it makes sense to define these metrics at a DEBUG
> > >> recording
> > >> level.
> > >>
> > >> On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban Quilcate Otoya <
> > >> quilcate.jo...@gmail.com> wrote:
> > >>
> > >> > On Thu, 8 Sept 2022 at 05:55, Yash Mayya <yash.ma...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi Jorge,
> > >> > >
> > >> > > Thanks for the changes. With regard to having per batch vs per
> > record
> > >> > > metrics, the additional overhead I was referring to wasn't about
> > >> whether
> > >> > or
> > >> > > not we would need to iterate over all the records in a batch. I
> was
> > >> > > referring to the potential additional overhead caused by the
> higher
> > >> > volume
> > >> > > of calls to Sensor::record on the sensors for the new metrics (as
> > >> > compared
> > >> > > to the existing batch only metrics), especially for high
> throughput
> > >> > > connectors where batch sizes could be large. I guess we may want
> to
> > do
> > >> > some
> > >> > > sort of performance testing and get concrete numbers to verify
> > whether
> > >> > this
> > >> > > is a valid concern or not?
> > >> > >
> > >> >
> > >> > 6.1. Got it, thanks for clarifying. I guess there could be a
> benchmark
> > >> test
> > >> > of the `Sensor::record` to get an idea of the performance impact.
> > >> > Regardless, the fact that these are single-record metrics compared
> to
> > >> > existing batch-only could be explicitly defined by setting these
> > >> metrics at
> > >> > a DEBUG or TRACE metric recording level, leaving the existing at
> INFO
> > >> > level.
> > >> > wdyt?
> > >> >
> > >> >
> > >> > >
> > >> > > Thanks,
> > >> > > Yash
> > >> > >
> > >> > > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban Quilcate Otoya <
> > >> > > quilcate.jo...@gmail.com> wrote:
> > >> > >
> > >> > > > Hi Sagar and Yash,
> > >> > > >
> > >> > > > > the way it's defined in
> > >> > > > https://kafka.apache.org/documentation/#connect_monitoring for
> > the
> > >> > > metrics
> > >> > > >
> > >> > > > 4.1. Got it. Add it to the KIP.
> > >> > > >
> > >> > > > > The only thing I would argue is do we need
> > >> sink-record-latency-min?
> > >> > > Maybe
> > >> > > > we
> > >> > > > > could remove this min metric as well and make all of the 3 e2e
> > >> > metrics
> > >> > > > > consistent
> > >> > > >
> > >> > > > 4.2 I see. Will remove it from the KIP.
> > >> > > >
> > >> > > > > Probably users can track the metrics at their end to
> > >> > > > > figure that out. Do you think that makes sense?
> > >> > > >
> > >> > > > 4.3. Yes, agree. With these new metrics it should be easier for
> > >> users
> > >> > to
> > >> > > > track this.
> > >> > > >
> > >> > > > > I think it makes sense to not have a min metric for either to
> > >> remain
> > >> > > > > consistent with the existing put-batch and poll-batch metrics
> > >> > > >
> > >> > > > 5.1. Got it. Same as 4.2
> > >> > > >
> > >> > > > > Another naming related suggestion I had was with the
> > >> > > > > "convert-time" metrics - we should probably include
> > >> transformations
> > >> > in
> > >> > > > the
> > >> > > > > name since SMTs could definitely be attributable to a sizable
> > >> chunk
> > >> > of
> > >> > > > the
> > >> > > > > latency depending on the specific transformation chain.
> > >> > > >
> > >> > > > 5.2. Make sense. I'm proposing to add
> > >> > `sink-record-convert-transform...`
> > >> > > > and `source-record-transform-convert...` to represent correctly
> > the
> > >> > order
> > >> > > > of operations.
> > >> > > >
> > >> > > > > it seems like both source and sink tasks only record metrics
> at
> > a
> > >> > > "batch"
> > >> > > > > level, not on an individual record level. I think it might be
> > >> > > additional
> > >> > > > > overhead if we want to record these new metrics all at the
> > record
> > >> > > level?
> > >> > > >
> > >> > > > 5.3. I considered at the beginning to implement all metrics at
> the
> > >> > batch
> > >> > > > level, but given how the framework process records, I fallback
> to
> > >> the
> > >> > > > proposed approach:
> > >> > > > - Sink Task:
> > >> > > >   - `WorkerSinkTask#convertMessages(msgs)` already iterates over
> > >> > records,
> > >> > > > so there is no additional overhead to capture record latency per
> > >> > record.
> > >> > > >     -
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L490-L514
> > >> > > >   - `WorkerSinkTask#convertAndTransformRecord(record)` actually
> > >> happens
> > >> > > > individually. Measuring this operation per batch would include
> > >> > processing
> > >> > > > that is not strictly part of "convert and transform"
> > >> > > >     -
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L518
> > >> > > > - Source Task:
> > >> > > >   - `AbstractWorkerSourceTask#sendRecords` iterates over a batch
> > and
> > >> > > > applies transforms and convert record individually as well:
> > >> > > >     -
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L389-L390
> > >> > > >
> > >> > > > > This might require some additional changes -
> > >> > > > > for instance, with the "sink-record-latency" metric, we might
> > only
> > >> > want
> > >> > > > to
> > >> > > > > have a "max" metric since "avg" would require recording a
> value
> > on
> > >> > the
> > >> > > > > sensor for each record (whereas we can get a "max" by only
> > >> recording
> > >> > a
> > >> > > > > metric value for the oldest record in each batch).
> > >> > > >
> > >> > > > 5.4. Recording record-latency per batch may not be as useful as
> > >> there
> > >> > is
> > >> > > no
> > >> > > > guarantee that the oldest record will be representative of the
> > >> batch.
> > >> > > >
> > >> > > > On Sat, 3 Sept 2022 at 16:02, Yash Mayya <yash.ma...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Jorge and Sagar,
> > >> > > > >
> > >> > > > > I think it makes sense to not have a min metric for either to
> > >> remain
> > >> > > > > consistent with the existing put-batch and poll-batch metrics
> > (it
> > >> > > doesn't
> > >> > > > > seem particularly useful either anyway). Also, the new
> > >> > > > > "sink-record-latency" metric name looks fine to me, thanks for
> > >> making
> > >> > > the
> > >> > > > > changes! Another naming related suggestion I had was with the
> > >> > > > > "convert-time" metrics - we should probably include
> > >> transformations
> > >> > in
> > >> > > > the
> > >> > > > > name since SMTs could definitely be attributable to a sizable
> > >> chunk
> > >> > of
> > >> > > > the
> > >> > > > > latency depending on the specific transformation chain.
> > >> > > > >
> > >> > > > > I have one high level question with respect to implementation
> -
> > >> > > > currently,
> > >> > > > > it seems like both source and sink tasks only record metrics
> at
> > a
> > >> > > "batch"
> > >> > > > > level, not on an individual record level. I think it might be
> > >> > > additional
> > >> > > > > overhead if we want to record these new metrics all at the
> > record
> > >> > > level?
> > >> > > > > Could we instead make all of these new metrics for batches of
> > >> records
> > >> > > > > rather than individual records in order to remain consistent
> > with
> > >> the
> > >> > > > > existing task level metrics? This might require some
> additional
> > >> > > changes -
> > >> > > > > for instance, with the "sink-record-latency" metric, we might
> > only
> > >> > want
> > >> > > > to
> > >> > > > > have a "max" metric since "avg" would require recording a
> value
> > on
> > >> > the
> > >> > > > > sensor for each record (whereas we can get a "max" by only
> > >> recording
> > >> > a
> > >> > > > > metric value for the oldest record in each batch).
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Yash
> > >> > > > >
> > >> > > > > On Fri, Sep 2, 2022 at 3:16 PM Sagar <
> sagarmeansoc...@gmail.com
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > Hi Jorge,
> > >> > > > > >
> > >> > > > > > Thanks for the changes.
> > >> > > > > >
> > >> > > > > > Regarding the metrics, I meant something like this:
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
> > >> > > > > >
> > >> > > > > > the way it's defined in
> > >> > > > > > https://kafka.apache.org/documentation/#connect_monitoring
> > for
> > >> the
> > >> > > > > > metrics.
> > >> > > > > >
> > >> > > > > > I see what you mean by the 3 metrics and how it can be
> > >> interpreted.
> > >> > > The
> > >> > > > > > only thing I would argue is do we need
> > sink-record-latency-min?
> > >> > Maybe
> > >> > > > we
> > >> > > > > > could remove this min metric as well and make all of the 3
> e2e
> > >> > > metrics
> > >> > > > > > consistent(since put-batch also doesn't expose a min which
> > makes
> > >> > > sense
> > >> > > > to
> > >> > > > > > me). I think this is in contrast to what Yash pointed out
> > above
> > >> so
> > >> > I
> > >> > > > > would
> > >> > > > > > like to hear his thoughts as well.
> > >> > > > > >
> > >> > > > > > The other point Yash mentioned about the slightly flawed
> > >> definition
> > >> > > of
> > >> > > > > e2e
> > >> > > > > > is also true in a sense. But I have a feeling that's one the
> > >> > records
> > >> > > > are
> > >> > > > > > polled by the connector tasks, it would be difficult to
> track
> > >> the
> > >> > > final
> > >> > > > > leg
> > >> > > > > > via the framework. Probably users can track the metrics at
> > their
> > >> > end
> > >> > > to
> > >> > > > > > figure that out. Do you think that makes sense?
> > >> > > > > >
> > >> > > > > > Thanks!
> > >> > > > > > Sagar.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Thu, Sep 1, 2022 at 11:40 PM Jorge Esteban Quilcate
> Otoya <
> > >> > > > > > quilcate.jo...@gmail.com> wrote:
> > >> > > > > >
> > >> > > > > > > Hi Sagar and Yash,
> > >> > > > > > >
> > >> > > > > > > Thanks for your feedback!
> > >> > > > > > >
> > >> > > > > > > > 1) I am assuming the new metrics would be task level
> > metric.
> > >> > > > > > >
> > >> > > > > > > 1.1 Yes, it will be a task level metric, implemented on
> the
> > >> > > > > > > Worker[Source/Sink]Task.
> > >> > > > > > >
> > >> > > > > > > > Could you specify the way it's done for other
> sink/source
> > >> > > > connector?
> > >> > > > > > >
> > >> > > > > > > 1.2. Not sure what do you mean by this. Could you
> elaborate
> > a
> > >> bit
> > >> > > > more?
> > >> > > > > > >
> > >> > > > > > > > 2. I am slightly confused about the e2e latency
> metric...
> > >> > > > > > >
> > >> > > > > > > 2.1. Yes, I see. I was trying to bring a similar concept
> as
> > in
> > >> > > > Streams
> > >> > > > > > with
> > >> > > > > > > KIP-613, though the e2e concept may not be translatable.
> > >> > > > > > > We could keep it as `sink-record-latency` to avoid
> > conflating
> > >> > > > > concepts. A
> > >> > > > > > > similar metric naming was proposed in KIP-489 but at the
> > >> consumer
> > >> > > > > level —
> > >> > > > > > > though it seems dormant for a couple of years.
> > >> > > > > > >
> > >> > > > > > > > However, the put-batch time measures the
> > >> > > > > > > > time to put a batch of records to external sink. So, I
> > would
> > >> > > assume
> > >> > > > > > the 2
> > >> > > > > > > > can't be added as is to compute the e2e latency. Maybe I
> > am
> > >> > > missing
> > >> > > > > > > > something here. Could you plz clarify this.
> > >> > > > > > >
> > >> > > > > > > 2.2. Yes, agree. Not necessarily added, but with the 3
> > >> latencies
> > >> > > > (poll,
> > >> > > > > > > convert, putBatch) will be clearer where the bottleneck
> may
> > >> be,
> > >> > and
> > >> > > > > > > represent the internal processing.
> > >> > > > > > >
> > >> > > > > > > > however, as per the KIP it looks like it will be
> > >> > > > > > > > the latency between when the record was written to Kafka
> > and
> > >> > when
> > >> > > > the
> > >> > > > > > > > record is returned by a sink task's consumer's poll?
> > >> > > > > > >
> > >> > > > > > > 3.1. Agree. 2.1. could help to clarify this.
> > >> > > > > > >
> > >> > > > > > > > One more thing - I was wondering
> > >> > > > > > > > if there's a particular reason for having a min metric
> for
> > >> e2e
> > >> > > > > latency
> > >> > > > > > > but
> > >> > > > > > > > not for convert time?
> > >> > > > > > >
> > >> > > > > > > 3.2. Was following KIP-613 for e2e which seems useful to
> > >> compare
> > >> > > with
> > >> > > > > > Max a
> > >> > > > > > > get an idea of the window of results, though current
> > >> latencies in
> > >> > > > > > Connector
> > >> > > > > > > do not include Min, and that's why I haven't added it for
> > >> convert
> > >> > > > > > latency.
> > >> > > > > > > Do you think it make sense to extend latency metrics with
> > Min?
> > >> > > > > > >
> > >> > > > > > > KIP is updated to clarify some of these changes.
> > >> > > > > > >
> > >> > > > > > > Many thanks,
> > >> > > > > > > Jorge.
> > >> > > > > > >
> > >> > > > > > > On Thu, 1 Sept 2022 at 18:11, Yash Mayya <
> > >> yash.ma...@gmail.com>
> > >> > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi Jorge,
> > >> > > > > > > >
> > >> > > > > > > > Thanks for the KIP! I have the same confusion with the
> > >> > > e2e-latency
> > >> > > > > > > metrics
> > >> > > > > > > > as Sagar above. "e2e" would seem to indicate the latency
> > >> > between
> > >> > > > when
> > >> > > > > > the
> > >> > > > > > > > record was written to Kafka and when the record was
> > written
> > >> to
> > >> > > the
> > >> > > > > sink
> > >> > > > > > > > system by the connector - however, as per the KIP it
> looks
> > >> like
> > >> > > it
> > >> > > > > will
> > >> > > > > > > be
> > >> > > > > > > > the latency between when the record was written to Kafka
> > and
> > >> > when
> > >> > > > the
> > >> > > > > > > > record is returned by a sink task's consumer's poll? I
> > think
> > >> > that
> > >> > > > > > metric
> > >> > > > > > > > will be a little confusing to interpret. One more thing
> -
> > I
> > >> was
> > >> > > > > > wondering
> > >> > > > > > > > if there's a particular reason for having a min metric
> for
> > >> e2e
> > >> > > > > latency
> > >> > > > > > > but
> > >> > > > > > > > not for convert time?
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Yash
> > >> > > > > > > >
> > >> > > > > > > > On Thu, Sep 1, 2022 at 8:59 PM Sagar <
> > >> > sagarmeansoc...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hi Jorge,
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks for the KIP. It looks like a very good
> addition.
> > I
> > >> > > skimmed
> > >> > > > > > > through
> > >> > > > > > > > > once and had a couple of questions =>
> > >> > > > > > > > >
> > >> > > > > > > > > 1) I am assuming the new metrics would be task level
> > >> metric.
> > >> > > > Could
> > >> > > > > > you
> > >> > > > > > > > > specify the way it's done for other sink/source
> > connector?
> > >> > > > > > > > > 2) I am slightly confused about the e2e latency
> metric.
> > >> Let's
> > >> > > > > > consider
> > >> > > > > > > > the
> > >> > > > > > > > > sink connector metric. If I look at the way it's
> > supposed
> > >> to
> > >> > be
> > >> > > > > > > > calculated,
> > >> > > > > > > > > i.e the difference between the record timestamp and
> the
> > >> wall
> > >> > > > clock
> > >> > > > > > > time,
> > >> > > > > > > > it
> > >> > > > > > > > > looks like a per record metric. However, the put-batch
> > >> time
> > >> > > > > measures
> > >> > > > > > > the
> > >> > > > > > > > > time to put a batch of records to external sink. So, I
> > >> would
> > >> > > > assume
> > >> > > > > > > the 2
> > >> > > > > > > > > can't be added as is to compute the e2e latency.
> Maybe I
> > >> am
> > >> > > > missing
> > >> > > > > > > > > something here. Could you plz clarify this.
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks!
> > >> > > > > > > > > Sagar.
> > >> > > > > > > > >
> > >> > > > > > > > > On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate
> > >> Otoya
> > >> > <
> > >> > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hi all,
> > >> > > > > > > > > >
> > >> > > > > > > > > > I'd like to start a discussion thread on KIP-864:
> Add
> > >> > > > End-To-End
> > >> > > > > > > > Latency
> > >> > > > > > > > > > Metrics to Connectors.
> > >> > > > > > > > > > This KIP aims to improve the metrics available on
> > Source
> > >> > and
> > >> > > > Sink
> > >> > > > > > > > > > Connectors to measure end-to-end latency, including
> > >> source
> > >> > > and
> > >> > > > > sink
> > >> > > > > > > > > record
> > >> > > > > > > > > > conversion time, and sink record e2e latency
> (similar
> > to
> > >> > > > KIP-613
> > >> > > > > > for
> > >> > > > > > > > > > Streams).
> > >> > > > > > > > > >
> > >> > > > > > > > > > The KIP is here:
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > >> > > > > > > > > >
> > >> > > > > > > > > > Please take a look and let me know what you think.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Cheers,
> > >> > > > > > > > > > Jorge.
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Reply via email to