Hi Jorge, Thanks for indulging my paranoia. LGTM!
Cheers, Chris On Mon, Dec 5, 2022 at 10:06 AM Jorge Esteban Quilcate Otoya < quilcate.jo...@gmail.com> wrote: > Sure! I have a added the following to the proposed changes section: > > ``` > The per-record metrics will definitely be added to Kafka Connect as part of > this KIP, but their metric level will be changed pending the performance > testing described in KAFKA-14441, and will otherwise only be exposed at > lower level (DEBUG instead of INFO, and TRACE instead of DEBUG) > ``` > > Let me know if how does it look. > > Many thanks! > Jorge. > > On Mon, 5 Dec 2022 at 14:11, Chris Egerton <chr...@aiven.io.invalid> > wrote: > > > Hi Jorge, > > > > Thanks for filing KAFKA-14441! In the ticket description we mention that > > "there will be more confidence whether to design metrics to be exposed > at a > > DEBUG or INFO level depending on their impact" but it doesn't seem like > > this is called out in the KIP and, just based on what's in the KIP, the > > proposal is still to have several per-record metrics exposed at INFO > level. > > > > Could we explicitly call out that the per-record metrics will definitely > be > > added to Kafka Connect as part of this KIP, but they will only be exposed > > at INFO level pending pending the performance testing described in > > KAFKA-14441, and will otherwise only be exposed at DEBUG level? > Otherwise, > > it's possible that a vote for the KIP as it's written today would be a > vote > > in favor of unconditionally exposing these metrics at INFO level, even if > > the performance testing reveals issues. > > > > Cheers, > > > > Chris > > > > On Sun, Dec 4, 2022 at 7:08 PM Jorge Esteban Quilcate Otoya < > > quilcate.jo...@gmail.com> wrote: > > > > > Thanks for the reminder Chris! > > > > > > I have added a note on the KIP to include this as part of the KIP as > most > > > of the metrics proposed are per-record and having all on DEBUG would > > limit > > > the benefits, and created > > > https://issues.apache.org/jira/browse/KAFKA-14441 > > > to keep track of this task. > > > > > > Cheers, > > > Jorge. > > > > > > On Tue, 29 Nov 2022 at 19:40, Chris Egerton <chr...@aiven.io.invalid> > > > wrote: > > > > > > > Hi Jorge, > > > > > > > > Thanks! What were your thoughts on the possible benchmarking and/or > > > > downgrading of per-record metrics to DEBUG? > > > > > > > > Cheers, > > > > > > > > Chris > > > > > > > > On Thu, Nov 24, 2022 at 8:20 AM Jorge Esteban Quilcate Otoya < > > > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > Thanks Chris! I have updated the KIP with "transform" instead of > > > "alias". > > > > > Agree it's clearer. > > > > > > > > > > Cheers, > > > > > Jorge. > > > > > > > > > > On Mon, 21 Nov 2022 at 21:36, Chris Egerton > <chr...@aiven.io.invalid > > > > > > > > wrote: > > > > > > > > > > > Hi Jorge, > > > > > > > > > > > > Thanks for the updates, and apologies for the delay. The new > > diagram > > > > > > directly under the "Proposed Changes" section is absolutely > > gorgeous! > > > > > > > > > > > > > > > > > > Follow-ups: > > > > > > > > > > > > RE 2: Good point. We can use the same level for these metrics, > it's > > > > not a > > > > > > big deal. > > > > > > > > > > > > RE 3: As long as all the per-record metrics are kept at DEBUG > > level, > > > it > > > > > > should be fine to leave JMH benchmarking for a follow-up. If we > > want > > > to > > > > > add > > > > > > new per-record, INFO-level metrics, I would be more comfortable > > with > > > > > > including benchmarking as part of the testing plan for the KIP. > One > > > > > > possible compromise could be to propose that these features be > > merged > > > > at > > > > > > DEBUG level, and then possibly upgraded to INFO level in the > future > > > > > pending > > > > > > benchmarks to guard against performance degradation. > > > > > > > > > > > > RE 4: I think for a true "end-to-end" metric, it'd be useful to > > > include > > > > > the > > > > > > time taken by the task to actually deliver the record. However, > > with > > > > the > > > > > > new metric names and descriptions provided in the KIP, I have no > > > > > objections > > > > > > with what's currently proposed, and a new "end-to-end" metric can > > be > > > > > taken > > > > > > on later in a follow-up KIP. > > > > > > > > > > > > RE 6: You're right, existing producer metrics should be enough > for > > > now. > > > > > We > > > > > > can revisit this later if/when we add delivery-centric metrics > for > > > sink > > > > > > tasks as well. > > > > > > > > > > > > RE 7: The new metric names in the KIP LGTM; I don't see any need > to > > > > > expand > > > > > > beyond those but if you'd still like to pursue others, LMK. > > > > > > > > > > > > > > > > > > New thoughts: > > > > > > > > > > > > One small thought: instead of "alias" in > "alias="{transform_alias}" > > > for > > > > > the > > > > > > per-transform metrics, could we use "transform"? IMO it's clearer > > > since > > > > > we > > > > > > don't use "alias" in the names of transform-related properties, > and > > > > > "alias" > > > > > > may be confused with the classloading term where you can use, > e.g., > > > > > > "FileStreamSource" as the name of a connector class in a > connector > > > > config > > > > > > instead of > > "org.apache.kafka.connect.file.FileStreamSourceConnector". > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Chris > > > > > > > > > > > > On Fri, Nov 18, 2022 at 12:06 PM Jorge Esteban Quilcate Otoya < > > > > > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > > > Thanks Mickael! > > > > > > > > > > > > > > > > > > > > > On Wed, 9 Nov 2022 at 15:54, Mickael Maison < > > > > mickael.mai...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Jorge, > > > > > > > > > > > > > > > > Thanks for the KIP, it is a nice improvement. > > > > > > > > > > > > > > > > 1) The per transformation metrics still have a question mark > > next > > > > to > > > > > > > > them in the KIP. Do you want to include them? If so we'll > want > > to > > > > tag > > > > > > > > them, we should be able to include the aliases in > > > > TransformationChain > > > > > > > > and use them. > > > > > > > > > > > > > > > > > > > > > > Yes, I have added the changes on TransformChain that will be > > needed > > > > to > > > > > > add > > > > > > > these metrics. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) I see no references to predicates. If we don't want to > > measure > > > > > > > > their latency, can we say it explicitly? > > > > > > > > > > > > > > > > > > > > > > Good question, I haven't considered these. Though as these are > > > > > > materialized > > > > > > > as PredicatedTransformation, they should be covered by these > > > changes. > > > > > > > Adding a note about this. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3) Should we have sink-record-batch-latency-avg-ms? All other > > > > metrics > > > > > > > > have both the maximum and average values. > > > > > > > > > > > > > > > > > > > > > > > Good question. I will remove it and change the record latency > > from > > > > > > > DEBUG->INFO as it already cover the maximum metric. > > > > > > > > > > > > > > Hope it's clearer now, let me know if there any additional > > > feedback. > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Mickael > > > > > > > > > > > > > > > > On Thu, Oct 20, 2022 at 9:58 PM Jorge Esteban Quilcate Otoya > > > > > > > > <quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Thanks, Chris! Great feedback! Please, find my comments > > below: > > > > > > > > > > > > > > > > > > On Thu, 13 Oct 2022 at 18:52, Chris Egerton > > > > > <chr...@aiven.io.invalid > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Jorge, > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. I agree with the overall direction > and > > > > think > > > > > > this > > > > > > > > would > > > > > > > > > > be a nice improvement to Kafka Connect. Here are my > initial > > > > > > thoughts > > > > > > > > on the > > > > > > > > > > details: > > > > > > > > > > > > > > > > > > > > 1. The motivation section outlines the gaps in Kafka > > > Connect's > > > > > task > > > > > > > > metrics > > > > > > > > > > nicely. I think it'd be useful to include more concrete > > > details > > > > > on > > > > > > > why > > > > > > > > > > these gaps need to be filled in, and in which cases > > > additional > > > > > > > metrics > > > > > > > > > > would be helpful. One goal could be to provide enhanced > > > > > monitoring > > > > > > of > > > > > > > > > > production deployments that allows for cluster > > administrators > > > > to > > > > > > set > > > > > > > up > > > > > > > > > > automatic alerts for latency spikes and, if triggered, > > > quickly > > > > > > > > identify the > > > > > > > > > > root cause of those alerts, reducing the time to > > remediation. > > > > > > Another > > > > > > > > goal > > > > > > > > > > could be to provide more insight to developers or cluster > > > > > > > > administrators > > > > > > > > > > who want to do performance testing on connectors in > > > > > non-production > > > > > > > > > > environments. It may help guide our decision making > process > > > to > > > > > > have a > > > > > > > > > > clearer picture of the goals we're trying to achieve. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Agree. The Motivation section has been updated. > > > > > > > > > Thanks for the examples, I see both of them being covered > by > > > the > > > > > KIP. > > > > > > > > > I see how these could give us a good distinction on whether > > to > > > > > > position > > > > > > > > > some metrics at INFO or DEBUG level. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. If we're trying to address the alert-and-diagnose use > > > case, > > > > > it'd > > > > > > > be > > > > > > > > > > useful to have as much information as possible at INFO > > level, > > > > > > rather > > > > > > > > than > > > > > > > > > > forcing cluster administrators to possibly reconfigure a > > > > > connector > > > > > > to > > > > > > > > emit > > > > > > > > > > DEBUG or TRACE level metrics in order to diagnose a > > potential > > > > > > > > > > production-impacting performance bottleneck. I can see > the > > > > > > rationale > > > > > > > > for > > > > > > > > > > emitting per-record metrics that track an average value > at > > > > DEBUG > > > > > > > > level, but > > > > > > > > > > for per-record metrics that track a maximum value, is > there > > > any > > > > > > > reason > > > > > > > > not > > > > > > > > > > to provide this information at INFO level? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Agree. Though with Max and Avg metrics being part of the > same > > > > > sensor > > > > > > — > > > > > > > > > where Metric Level is defined — then both metrics get the > > same > > > > > level. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. I'm also curious about the performance testing > suggested > > > by > > > > > Yash > > > > > > > to > > > > > > > > > > gauge the potential impact of this change. Have you been > > able > > > > to > > > > > do > > > > > > > any > > > > > > > > > > testing with your draft implementation yet? > > > > > > > > > > > > > > > > > > > > > > > > > > > > No, not so far. > > > > > > > > > I think it would be valuable to discuss the scope of this > > > testing > > > > > and > > > > > > > > maybe > > > > > > > > > tackle it > > > > > > > > > in a separate issue as Sensors and Metrics are used all > over > > > the > > > > > > place. > > > > > > > > > My initial understanding is that these tests should by > placed > > > in > > > > > the > > > > > > > > > jmh-benchmarks[1]. > > > > > > > > > Then, we could target testing Sensors and Metrics, and > > validate > > > > how > > > > > > > much > > > > > > > > > overhead > > > > > > > > > is added by having only Max vs Max,Avg(,Min), etc. > > > > > > > > > In the other hand, we could extend this to Transformers or > > > other > > > > > > > Connect > > > > > > > > > layers. > > > > > > > > > > > > > > > > > > Here are some pointers to the Sensors and Metrics > > > implementations > > > > > > that > > > > > > > > > could be considered: > > > > > > > > > Path to metric recording: > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L195-L199 > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L230-L244 > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > // increment all the stats > > > > > > > > > for (StatAndConfig statAndConfig : this.stats) { > > > > > > > > > statAndConfig.stat.record(statAndConfig.config(), value, > > > > > timeMs); > > > > > > > > > } > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > SampledStats: > > > > > > > > > - Avg: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Avg.java > > > > > > > > > - Max: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Max.java > > > > > > > > > - Min: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Min.java > > > > > > > > > > > > > > > > > > `stat#record()` are implemented by `update` method in > > > > SampledStat: > > > > > > > > > > > > > > > > > > ```Max.java > > > > > > > > > @Override > > > > > > > > > protected void update(Sample sample, MetricConfig > config, > > > > > double > > > > > > > > value, > > > > > > > > > long now) { > > > > > > > > > sample.value += value; > > > > > > > > > } > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > ```Avg.java > > > > > > > > > @Override > > > > > > > > > protected void update(Sample sample, MetricConfig > config, > > > > > double > > > > > > > > value, > > > > > > > > > long now) { > > > > > > > > > sample.value = Math.max(sample.value, value); > > > > > > > > > } > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > As far as I understand, most of the work of the stats > happens > > > on > > > > > the > > > > > > > > > `combine` method that is not part of the connector > execution > > > but > > > > > > called > > > > > > > > > when metrics are queried. > > > > > > > > > > > > > > > > > > I wonder whether we should consider Avg and Max for all > > metrics > > > > > > > proposed > > > > > > > > as > > > > > > > > > the impact on the execution path seems minimal, and even > see > > if > > > > Min > > > > > > is > > > > > > > > also > > > > > > > > > valuable, and use DEBUG only for more granular metrics. > > > > > > > > > > > > > > > > > > [1] > > https://github.com/apache/kafka/tree/trunk/jmh-benchmarks > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Just to make sure I understand correctly--does "time > > when > > > it > > > > > has > > > > > > > > been > > > > > > > > > > received by the Sink task" refer to the wallclock time > > > directly > > > > > > > after a > > > > > > > > > > call to SinkTask::put has been completed (as opposed to > > > > directly > > > > > > > before > > > > > > > > > > that call is made, or something else entirely)? > > > > > > > > > > > > > > > > > > > > > > > > > > > > It currently means when it has been received by the Sink > task > > > > > > > > > right after consumer poll and before conversions. > > > > > > > > > Would it be valuable to have it after put-sink-records? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. If the goal is to identify performance bottlenecks > > (either > > > > in > > > > > > > > production > > > > > > > > > > or pre-production environments), would it make sense to > > > > introduce > > > > > > > > metrics > > > > > > > > > > for each individual converter (i.e., key/value/header) > and > > > > > > > > transformation? > > > > > > > > > > It's definitely an improvement to be able to identify the > > > total > > > > > > time > > > > > > > > for > > > > > > > > > > conversion and transformation, but then the immediate > > > follow-up > > > > > > > > question if > > > > > > > > > > a bottleneck is found in that phase is "which > > > > > > > converter/transformation > > > > > > > > is > > > > > > > > > > responsible?" It'd be nice if we could provide a way to > > > quickly > > > > > > > answer > > > > > > > > that > > > > > > > > > > question. > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is a great idea. I'd like to consider this as well, > > though > > > > > maybe > > > > > > > > these > > > > > > > > > more granular > > > > > > > > > metrics would be good to have them as DEBUG. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. Any thoughts about offering latency metrics for source > > > tasks > > > > > > > between > > > > > > > > > > receipt of the record from the task and delivery of the > > > record > > > > to > > > > > > > Kafka > > > > > > > > > > (which would be tracked by producer callback)? We could > > also > > > > use > > > > > > the > > > > > > > > record > > > > > > > > > > timestamp either instead of or in addition to receipt > time > > if > > > > the > > > > > > > task > > > > > > > > > > provides a timestamp with its records. > > > > > > > > > > > > > > > > > > > > > > > > > > > > With source transform and convert metrics we get part of > that > > > > > > latency. > > > > > > > > > Looking at the Producer metrics, `request-latency` (though > a > > > very > > > > > > > generic > > > > > > > > > metric) > > > > > > > > > sort of answer the time between send request and ack — if > my > > > > > > > > understanding > > > > > > > > > is correct. > > > > > > > > > Would these be enough or you're thinking about another > > > approach? > > > > > > > > > maybe a custom metric to cover the producer side? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7. We may end up introducing a way for sink tasks to > record > > > > > > > per-record > > > > > > > > > > delivery to the sink system (see KIP-767 [1]). I'd like > it > > if > > > > we > > > > > > > could > > > > > > > > keep > > > > > > > > > > the names of our metrics very precise in order to avoid > > > > confusing > > > > > > > users > > > > > > > > > > (who may think that we're providing metrics on actual > > > delivery > > > > to > > > > > > the > > > > > > > > sink > > > > > > > > > > system, which may not be the case if the connector > performs > > > > > > > > asynchronous > > > > > > > > > > writes), and in order to leave room for a metrics on true > > > > > delivery > > > > > > > > time by > > > > > > > > > > sink tasks. It'd also be nice if we could remain > consistent > > > > with > > > > > > > > existing > > > > > > > > > > metrics such as "put-batch-avg-time-ms". With that in > mind, > > > > what > > > > > do > > > > > > > you > > > > > > > > > > think about renaming these metrics: > > > > > > > > > > - "sink-record-batch-latency-max-ms" to > > > > > "put-batch-avg-latency-ms" > > > > > > > > > > - "sink-record-latency-max-ms" to > > > > > "put-sink-record-latency-max-ms" > > > > > > > > > > - "sink-record-latency-avg-ms" to > > > > > "put-sink-record-latency-avg-ms" > > > > > > > > > > - "sink-record-convert-transform-time-max-ms" to > > > > > > > > > > "convert-transform-sink-record-time-max-ms" > > > > > > > > > > - "sink-record-convert-transform-time-avg-ms" to > > > > > > > > > > "convert-transform-sink-record-time-avg-ms" > > > > > > > > > > - "source-record-transform-convert-time-max-ms" to > > > > > > > > > > "transform-convert-source-record-time-max-ms" > > > > > > > > > > - "source-record-transform-convert-time-avg-ms" to > > > > > > > > > > "transform-convert-source-record-time-avg-ms" > > > > > > > > > > > > > > > > > > > > > > > > > > > > Make sense, thanks! I have updated the list of metrics and > > > group > > > > > them > > > > > > > by > > > > > > > > > sensor and applying these suggestions. > > > > > > > > > The only ones that I want to review are: sink-record-* to > > > > > put-batch-* > > > > > > > > > (first 3). Not sure if put-batch/put-sink-record describes > > the > > > > > > purpose > > > > > > > of > > > > > > > > > the metric — neither `sink-record-latency` to be honest. > > > > > > > > > My initial thought was to have something like Kafka Streams > > > > > > > e2e-latency. > > > > > > > > > Based on 4. and 6. questions, an idea could be to add: > > > > > > > > > - source-batch-e2e-latency-before-send: measure wallclock - > > > > source > > > > > > > record > > > > > > > > > timestamp after source connector poll. > > > > > > > > > - source-batch-e2e-latency-after-send: measure wallclock - > > > record > > > > > > > > timestamp > > > > > > > > > on producer send callback > > > > > > > > > - sink-batch-e2e-latency-before-put: measure time > wallclock - > > > > > record > > > > > > > > > timestamp after consumer poll > > > > > > > > > - sink-batch-e2e-latency-after-put: measure time wallclock > - > > > > record > > > > > > > > > timestamp after sink connector put. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks again for the KIP! Looking forward to your > thoughts. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > > > [1] - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics > > > > > > > > > > > > > > > > > > > > On Thu, Sep 15, 2022 at 1:32 PM Jorge Esteban Quilcate > > Otoya > > > < > > > > > > > > > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > I've made a slight addition to the KIP based on Yash > > > > feedback: > > > > > > > > > > > > > > > > > > > > > > - A new metric is added at INFO level to record the max > > > > latency > > > > > > > from > > > > > > > > the > > > > > > > > > > > batch timestamp, by keeping the oldest record timestamp > > per > > > > > > batch. > > > > > > > > > > > - A draft implementation is linked. > > > > > > > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > Also, a kindly reminder that the vote thread is open. > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > Jorge. > > > > > > > > > > > > > > > > > > > > > > On Thu, 8 Sept 2022 at 14:25, Jorge Esteban Quilcate > > Otoya > > > < > > > > > > > > > > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > Great. I have updated the KIP to reflect this. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Jorge. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 8 Sept 2022 at 12:26, Yash Mayya < > > > > > yash.ma...@gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >> Thanks, I think it makes sense to define these > metrics > > > at > > > > a > > > > > > > DEBUG > > > > > > > > > > > >> recording > > > > > > > > > > > >> level. > > > > > > > > > > > >> > > > > > > > > > > > >> On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban > Quilcate > > > > Otoya > > > > > < > > > > > > > > > > > >> quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > >> > > > > > > > > > > > >> > On Thu, 8 Sept 2022 at 05:55, Yash Mayya < > > > > > > > yash.ma...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > >> > > Hi Jorge, > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Thanks for the changes. With regard to having > per > > > > batch > > > > > vs > > > > > > > per > > > > > > > > > > > record > > > > > > > > > > > >> > > metrics, the additional overhead I was referring > > to > > > > > wasn't > > > > > > > > about > > > > > > > > > > > >> whether > > > > > > > > > > > >> > or > > > > > > > > > > > >> > > not we would need to iterate over all the > records > > > in a > > > > > > > batch. > > > > > > > > I > > > > > > > > > > was > > > > > > > > > > > >> > > referring to the potential additional overhead > > > caused > > > > by > > > > > > the > > > > > > > > > > higher > > > > > > > > > > > >> > volume > > > > > > > > > > > >> > > of calls to Sensor::record on the sensors for > the > > > new > > > > > > > metrics > > > > > > > > (as > > > > > > > > > > > >> > compared > > > > > > > > > > > >> > > to the existing batch only metrics), especially > > for > > > > high > > > > > > > > > > throughput > > > > > > > > > > > >> > > connectors where batch sizes could be large. I > > guess > > > > we > > > > > > may > > > > > > > > want > > > > > > > > > > to > > > > > > > > > > > do > > > > > > > > > > > >> > some > > > > > > > > > > > >> > > sort of performance testing and get concrete > > numbers > > > > to > > > > > > > verify > > > > > > > > > > > whether > > > > > > > > > > > >> > this > > > > > > > > > > > >> > > is a valid concern or not? > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > 6.1. Got it, thanks for clarifying. I guess there > > > could > > > > > be a > > > > > > > > > > benchmark > > > > > > > > > > > >> test > > > > > > > > > > > >> > of the `Sensor::record` to get an idea of the > > > > performance > > > > > > > > impact. > > > > > > > > > > > >> > Regardless, the fact that these are single-record > > > > metrics > > > > > > > > compared > > > > > > > > > > to > > > > > > > > > > > >> > existing batch-only could be explicitly defined by > > > > setting > > > > > > > these > > > > > > > > > > > >> metrics at > > > > > > > > > > > >> > a DEBUG or TRACE metric recording level, leaving > the > > > > > > existing > > > > > > > at > > > > > > > > > > INFO > > > > > > > > > > > >> > level. > > > > > > > > > > > >> > wdyt? > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Thanks, > > > > > > > > > > > >> > > Yash > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban > > > Quilcate > > > > > > Otoya > > > > > > > < > > > > > > > > > > > >> > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > Hi Sagar and Yash, > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > the way it's defined in > > > > > > > > > > > >> > > > > > > > > > > https://kafka.apache.org/documentation/#connect_monitoring > > > > > > > > for > > > > > > > > > > > the > > > > > > > > > > > >> > > metrics > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 4.1. Got it. Add it to the KIP. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > The only thing I would argue is do we need > > > > > > > > > > > >> sink-record-latency-min? > > > > > > > > > > > >> > > Maybe > > > > > > > > > > > >> > > > we > > > > > > > > > > > >> > > > > could remove this min metric as well and > make > > > all > > > > of > > > > > > the > > > > > > > > 3 e2e > > > > > > > > > > > >> > metrics > > > > > > > > > > > >> > > > > consistent > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 4.2 I see. Will remove it from the KIP. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > Probably users can track the metrics at > their > > > end > > > > to > > > > > > > > > > > >> > > > > figure that out. Do you think that makes > > sense? > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 4.3. Yes, agree. With these new metrics it > > should > > > be > > > > > > > easier > > > > > > > > for > > > > > > > > > > > >> users > > > > > > > > > > > >> > to > > > > > > > > > > > >> > > > track this. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > I think it makes sense to not have a min > > metric > > > > for > > > > > > > > either to > > > > > > > > > > > >> remain > > > > > > > > > > > >> > > > > consistent with the existing put-batch and > > > > > poll-batch > > > > > > > > metrics > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 5.1. Got it. Same as 4.2 > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > Another naming related suggestion I had was > > with > > > > the > > > > > > > > > > > >> > > > > "convert-time" metrics - we should probably > > > > include > > > > > > > > > > > >> transformations > > > > > > > > > > > >> > in > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > name since SMTs could definitely be > > attributable > > > > to > > > > > a > > > > > > > > sizable > > > > > > > > > > > >> chunk > > > > > > > > > > > >> > of > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > latency depending on the specific > > transformation > > > > > > chain. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 5.2. Make sense. I'm proposing to add > > > > > > > > > > > >> > `sink-record-convert-transform...` > > > > > > > > > > > >> > > > and `source-record-transform-convert...` to > > > > represent > > > > > > > > correctly > > > > > > > > > > > the > > > > > > > > > > > >> > order > > > > > > > > > > > >> > > > of operations. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > it seems like both source and sink tasks > only > > > > record > > > > > > > > metrics > > > > > > > > > > at > > > > > > > > > > > a > > > > > > > > > > > >> > > "batch" > > > > > > > > > > > >> > > > > level, not on an individual record level. I > > > think > > > > it > > > > > > > > might be > > > > > > > > > > > >> > > additional > > > > > > > > > > > >> > > > > overhead if we want to record these new > > metrics > > > > all > > > > > at > > > > > > > the > > > > > > > > > > > record > > > > > > > > > > > >> > > level? > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 5.3. I considered at the beginning to > implement > > > all > > > > > > > metrics > > > > > > > > at > > > > > > > > > > the > > > > > > > > > > > >> > batch > > > > > > > > > > > >> > > > level, but given how the framework process > > > records, > > > > I > > > > > > > > fallback > > > > > > > > > > to > > > > > > > > > > > >> the > > > > > > > > > > > >> > > > proposed approach: > > > > > > > > > > > >> > > > - Sink Task: > > > > > > > > > > > >> > > > - `WorkerSinkTask#convertMessages(msgs)` > > already > > > > > > > iterates > > > > > > > > over > > > > > > > > > > > >> > records, > > > > > > > > > > > >> > > > so there is no additional overhead to capture > > > record > > > > > > > > latency per > > > > > > > > > > > >> > record. > > > > > > > > > > > >> > > > - > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L490-L514 > > > > > > > > > > > >> > > > - > > > > `WorkerSinkTask#convertAndTransformRecord(record)` > > > > > > > > actually > > > > > > > > > > > >> happens > > > > > > > > > > > >> > > > individually. Measuring this operation per > batch > > > > would > > > > > > > > include > > > > > > > > > > > >> > processing > > > > > > > > > > > >> > > > that is not strictly part of "convert and > > > transform" > > > > > > > > > > > >> > > > - > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L518 > > > > > > > > > > > >> > > > - Source Task: > > > > > > > > > > > >> > > > - `AbstractWorkerSourceTask#sendRecords` > > > iterates > > > > > > over a > > > > > > > > batch > > > > > > > > > > > and > > > > > > > > > > > >> > > > applies transforms and convert record > > individually > > > > as > > > > > > > well: > > > > > > > > > > > >> > > > - > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L389-L390 > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > This might require some additional changes - > > > > > > > > > > > >> > > > > for instance, with the "sink-record-latency" > > > > metric, > > > > > > we > > > > > > > > might > > > > > > > > > > > only > > > > > > > > > > > >> > want > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > have a "max" metric since "avg" would > require > > > > > > recording > > > > > > > a > > > > > > > > > > value > > > > > > > > > > > on > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > sensor for each record (whereas we can get a > > > "max" > > > > > by > > > > > > > only > > > > > > > > > > > >> recording > > > > > > > > > > > >> > a > > > > > > > > > > > >> > > > > metric value for the oldest record in each > > > batch). > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > 5.4. Recording record-latency per batch may > not > > be > > > > as > > > > > > > > useful as > > > > > > > > > > > >> there > > > > > > > > > > > >> > is > > > > > > > > > > > >> > > no > > > > > > > > > > > >> > > > guarantee that the oldest record will be > > > > > representative > > > > > > of > > > > > > > > the > > > > > > > > > > > >> batch. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > On Sat, 3 Sept 2022 at 16:02, Yash Mayya < > > > > > > > > yash.ma...@gmail.com> > > > > > > > > > > > >> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > Hi Jorge and Sagar, > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > I think it makes sense to not have a min > > metric > > > > for > > > > > > > > either to > > > > > > > > > > > >> remain > > > > > > > > > > > >> > > > > consistent with the existing put-batch and > > > > > poll-batch > > > > > > > > metrics > > > > > > > > > > > (it > > > > > > > > > > > >> > > doesn't > > > > > > > > > > > >> > > > > seem particularly useful either anyway). > Also, > > > the > > > > > new > > > > > > > > > > > >> > > > > "sink-record-latency" metric name looks fine > > to > > > > me, > > > > > > > > thanks for > > > > > > > > > > > >> making > > > > > > > > > > > >> > > the > > > > > > > > > > > >> > > > > changes! Another naming related suggestion I > > had > > > > was > > > > > > > with > > > > > > > > the > > > > > > > > > > > >> > > > > "convert-time" metrics - we should probably > > > > include > > > > > > > > > > > >> transformations > > > > > > > > > > > >> > in > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > name since SMTs could definitely be > > attributable > > > > to > > > > > a > > > > > > > > sizable > > > > > > > > > > > >> chunk > > > > > > > > > > > >> > of > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > latency depending on the specific > > transformation > > > > > > chain. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > I have one high level question with respect > to > > > > > > > > implementation > > > > > > > > > > - > > > > > > > > > > > >> > > > currently, > > > > > > > > > > > >> > > > > it seems like both source and sink tasks > only > > > > record > > > > > > > > metrics > > > > > > > > > > at > > > > > > > > > > > a > > > > > > > > > > > >> > > "batch" > > > > > > > > > > > >> > > > > level, not on an individual record level. I > > > think > > > > it > > > > > > > > might be > > > > > > > > > > > >> > > additional > > > > > > > > > > > >> > > > > overhead if we want to record these new > > metrics > > > > all > > > > > at > > > > > > > the > > > > > > > > > > > record > > > > > > > > > > > >> > > level? > > > > > > > > > > > >> > > > > Could we instead make all of these new > metrics > > > for > > > > > > > > batches of > > > > > > > > > > > >> records > > > > > > > > > > > >> > > > > rather than individual records in order to > > > remain > > > > > > > > consistent > > > > > > > > > > > with > > > > > > > > > > > >> the > > > > > > > > > > > >> > > > > existing task level metrics? This might > > require > > > > some > > > > > > > > > > additional > > > > > > > > > > > >> > > changes - > > > > > > > > > > > >> > > > > for instance, with the "sink-record-latency" > > > > metric, > > > > > > we > > > > > > > > might > > > > > > > > > > > only > > > > > > > > > > > >> > want > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > have a "max" metric since "avg" would > require > > > > > > recording > > > > > > > a > > > > > > > > > > value > > > > > > > > > > > on > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > sensor for each record (whereas we can get a > > > "max" > > > > > by > > > > > > > only > > > > > > > > > > > >> recording > > > > > > > > > > > >> > a > > > > > > > > > > > >> > > > > metric value for the oldest record in each > > > batch). > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > Thanks, > > > > > > > > > > > >> > > > > Yash > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On Fri, Sep 2, 2022 at 3:16 PM Sagar < > > > > > > > > > > sagarmeansoc...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > >> > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > Hi Jorge, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > Thanks for the changes. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > Regarding the metrics, I meant something > > like > > > > > this: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}" > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > the way it's defined in > > > > > > > > > > > >> > > > > > > > > > > > > > https://kafka.apache.org/documentation/#connect_monitoring > > > > > > > > > > > for > > > > > > > > > > > >> the > > > > > > > > > > > >> > > > > > metrics. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > I see what you mean by the 3 metrics and > how > > > it > > > > > can > > > > > > be > > > > > > > > > > > >> interpreted. > > > > > > > > > > > >> > > The > > > > > > > > > > > >> > > > > > only thing I would argue is do we need > > > > > > > > > > > sink-record-latency-min? > > > > > > > > > > > >> > Maybe > > > > > > > > > > > >> > > > we > > > > > > > > > > > >> > > > > > could remove this min metric as well and > > make > > > > all > > > > > of > > > > > > > > the 3 > > > > > > > > > > e2e > > > > > > > > > > > >> > > metrics > > > > > > > > > > > >> > > > > > consistent(since put-batch also doesn't > > > expose a > > > > > min > > > > > > > > which > > > > > > > > > > > makes > > > > > > > > > > > >> > > sense > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > > me). I think this is in contrast to what > > Yash > > > > > > pointed > > > > > > > > out > > > > > > > > > > > above > > > > > > > > > > > >> so > > > > > > > > > > > >> > I > > > > > > > > > > > >> > > > > would > > > > > > > > > > > >> > > > > > like to hear his thoughts as well. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > The other point Yash mentioned about the > > > > slightly > > > > > > > flawed > > > > > > > > > > > >> definition > > > > > > > > > > > >> > > of > > > > > > > > > > > >> > > > > e2e > > > > > > > > > > > >> > > > > > is also true in a sense. But I have a > > feeling > > > > > that's > > > > > > > > one the > > > > > > > > > > > >> > records > > > > > > > > > > > >> > > > are > > > > > > > > > > > >> > > > > > polled by the connector tasks, it would be > > > > > difficult > > > > > > > to > > > > > > > > > > track > > > > > > > > > > > >> the > > > > > > > > > > > >> > > final > > > > > > > > > > > >> > > > > leg > > > > > > > > > > > >> > > > > > via the framework. Probably users can > track > > > the > > > > > > > metrics > > > > > > > > at > > > > > > > > > > > their > > > > > > > > > > > >> > end > > > > > > > > > > > >> > > to > > > > > > > > > > > >> > > > > > figure that out. Do you think that makes > > > sense? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > Thanks! > > > > > > > > > > > >> > > > > > Sagar. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > On Thu, Sep 1, 2022 at 11:40 PM Jorge > > Esteban > > > > > > Quilcate > > > > > > > > > > Otoya < > > > > > > > > > > > >> > > > > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > Hi Sagar and Yash, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > Thanks for your feedback! > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > 1) I am assuming the new metrics would > > be > > > > task > > > > > > > level > > > > > > > > > > > metric. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 1.1 Yes, it will be a task level metric, > > > > > > implemented > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > > >> > > > > > > Worker[Source/Sink]Task. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > Could you specify the way it's done > for > > > > other > > > > > > > > > > sink/source > > > > > > > > > > > >> > > > connector? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 1.2. Not sure what do you mean by this. > > > Could > > > > > you > > > > > > > > > > elaborate > > > > > > > > > > > a > > > > > > > > > > > >> bit > > > > > > > > > > > >> > > > more? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > 2. I am slightly confused about the > e2e > > > > > latency > > > > > > > > > > metric... > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 2.1. Yes, I see. I was trying to bring a > > > > similar > > > > > > > > concept > > > > > > > > > > as > > > > > > > > > > > in > > > > > > > > > > > >> > > > Streams > > > > > > > > > > > >> > > > > > with > > > > > > > > > > > >> > > > > > > KIP-613, though the e2e concept may not > be > > > > > > > > translatable. > > > > > > > > > > > >> > > > > > > We could keep it as > `sink-record-latency` > > to > > > > > avoid > > > > > > > > > > > conflating > > > > > > > > > > > >> > > > > concepts. A > > > > > > > > > > > >> > > > > > > similar metric naming was proposed in > > > KIP-489 > > > > > but > > > > > > at > > > > > > > > the > > > > > > > > > > > >> consumer > > > > > > > > > > > >> > > > > level — > > > > > > > > > > > >> > > > > > > though it seems dormant for a couple of > > > years. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > However, the put-batch time measures > the > > > > > > > > > > > >> > > > > > > > time to put a batch of records to > > external > > > > > sink. > > > > > > > > So, I > > > > > > > > > > > would > > > > > > > > > > > >> > > assume > > > > > > > > > > > >> > > > > > the 2 > > > > > > > > > > > >> > > > > > > > can't be added as is to compute the > e2e > > > > > latency. > > > > > > > > Maybe I > > > > > > > > > > > am > > > > > > > > > > > >> > > missing > > > > > > > > > > > >> > > > > > > > something here. Could you plz clarify > > > this. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 2.2. Yes, agree. Not necessarily added, > > but > > > > with > > > > > > > the 3 > > > > > > > > > > > >> latencies > > > > > > > > > > > >> > > > (poll, > > > > > > > > > > > >> > > > > > > convert, putBatch) will be clearer where > > the > > > > > > > > bottleneck > > > > > > > > > > may > > > > > > > > > > > >> be, > > > > > > > > > > > >> > and > > > > > > > > > > > >> > > > > > > represent the internal processing. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > however, as per the KIP it looks like > it > > > > will > > > > > be > > > > > > > > > > > >> > > > > > > > the latency between when the record > was > > > > > written > > > > > > to > > > > > > > > Kafka > > > > > > > > > > > and > > > > > > > > > > > >> > when > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > > > > record is returned by a sink task's > > > > consumer's > > > > > > > poll? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 3.1. Agree. 2.1. could help to clarify > > this. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > One more thing - I was wondering > > > > > > > > > > > >> > > > > > > > if there's a particular reason for > > having > > > a > > > > > min > > > > > > > > metric > > > > > > > > > > for > > > > > > > > > > > >> e2e > > > > > > > > > > > >> > > > > latency > > > > > > > > > > > >> > > > > > > but > > > > > > > > > > > >> > > > > > > > not for convert time? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 3.2. Was following KIP-613 for e2e which > > > seems > > > > > > > useful > > > > > > > > to > > > > > > > > > > > >> compare > > > > > > > > > > > >> > > with > > > > > > > > > > > >> > > > > > Max a > > > > > > > > > > > >> > > > > > > get an idea of the window of results, > > though > > > > > > current > > > > > > > > > > > >> latencies in > > > > > > > > > > > >> > > > > > Connector > > > > > > > > > > > >> > > > > > > do not include Min, and that's why I > > haven't > > > > > added > > > > > > > it > > > > > > > > for > > > > > > > > > > > >> convert > > > > > > > > > > > >> > > > > > latency. > > > > > > > > > > > >> > > > > > > Do you think it make sense to extend > > latency > > > > > > metrics > > > > > > > > with > > > > > > > > > > > Min? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > KIP is updated to clarify some of these > > > > changes. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > Many thanks, > > > > > > > > > > > >> > > > > > > Jorge. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > On Thu, 1 Sept 2022 at 18:11, Yash > Mayya < > > > > > > > > > > > >> yash.ma...@gmail.com> > > > > > > > > > > > >> > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > Hi Jorge, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > Thanks for the KIP! I have the same > > > > confusion > > > > > > with > > > > > > > > the > > > > > > > > > > > >> > > e2e-latency > > > > > > > > > > > >> > > > > > > metrics > > > > > > > > > > > >> > > > > > > > as Sagar above. "e2e" would seem to > > > indicate > > > > > the > > > > > > > > latency > > > > > > > > > > > >> > between > > > > > > > > > > > >> > > > when > > > > > > > > > > > >> > > > > > the > > > > > > > > > > > >> > > > > > > > record was written to Kafka and when > the > > > > > record > > > > > > > was > > > > > > > > > > > written > > > > > > > > > > > >> to > > > > > > > > > > > >> > > the > > > > > > > > > > > >> > > > > sink > > > > > > > > > > > >> > > > > > > > system by the connector - however, as > > per > > > > the > > > > > > KIP > > > > > > > it > > > > > > > > > > looks > > > > > > > > > > > >> like > > > > > > > > > > > >> > > it > > > > > > > > > > > >> > > > > will > > > > > > > > > > > >> > > > > > > be > > > > > > > > > > > >> > > > > > > > the latency between when the record > was > > > > > written > > > > > > to > > > > > > > > Kafka > > > > > > > > > > > and > > > > > > > > > > > >> > when > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > > > > record is returned by a sink task's > > > > consumer's > > > > > > > > poll? I > > > > > > > > > > > think > > > > > > > > > > > >> > that > > > > > > > > > > > >> > > > > > metric > > > > > > > > > > > >> > > > > > > > will be a little confusing to > interpret. > > > One > > > > > > more > > > > > > > > thing > > > > > > > > > > - > > > > > > > > > > > I > > > > > > > > > > > >> was > > > > > > > > > > > >> > > > > > wondering > > > > > > > > > > > >> > > > > > > > if there's a particular reason for > > having > > > a > > > > > min > > > > > > > > metric > > > > > > > > > > for > > > > > > > > > > > >> e2e > > > > > > > > > > > >> > > > > latency > > > > > > > > > > > >> > > > > > > but > > > > > > > > > > > >> > > > > > > > not for convert time? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > Thanks, > > > > > > > > > > > >> > > > > > > > Yash > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > On Thu, Sep 1, 2022 at 8:59 PM Sagar < > > > > > > > > > > > >> > sagarmeansoc...@gmail.com> > > > > > > > > > > > >> > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > Hi Jorge, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Thanks for the KIP. It looks like a > > very > > > > > good > > > > > > > > > > addition. > > > > > > > > > > > I > > > > > > > > > > > >> > > skimmed > > > > > > > > > > > >> > > > > > > through > > > > > > > > > > > >> > > > > > > > > once and had a couple of questions > => > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > 1) I am assuming the new metrics > would > > > be > > > > > task > > > > > > > > level > > > > > > > > > > > >> metric. > > > > > > > > > > > >> > > > Could > > > > > > > > > > > >> > > > > > you > > > > > > > > > > > >> > > > > > > > > specify the way it's done for other > > > > > > sink/source > > > > > > > > > > > connector? > > > > > > > > > > > >> > > > > > > > > 2) I am slightly confused about the > > e2e > > > > > > latency > > > > > > > > > > metric. > > > > > > > > > > > >> Let's > > > > > > > > > > > >> > > > > > consider > > > > > > > > > > > >> > > > > > > > the > > > > > > > > > > > >> > > > > > > > > sink connector metric. If I look at > > the > > > > way > > > > > > it's > > > > > > > > > > > supposed > > > > > > > > > > > >> to > > > > > > > > > > > >> > be > > > > > > > > > > > >> > > > > > > > calculated, > > > > > > > > > > > >> > > > > > > > > i.e the difference between the > record > > > > > > timestamp > > > > > > > > and > > > > > > > > > > the > > > > > > > > > > > >> wall > > > > > > > > > > > >> > > > clock > > > > > > > > > > > >> > > > > > > time, > > > > > > > > > > > >> > > > > > > > it > > > > > > > > > > > >> > > > > > > > > looks like a per record metric. > > However, > > > > the > > > > > > > > put-batch > > > > > > > > > > > >> time > > > > > > > > > > > >> > > > > measures > > > > > > > > > > > >> > > > > > > the > > > > > > > > > > > >> > > > > > > > > time to put a batch of records to > > > external > > > > > > sink. > > > > > > > > So, I > > > > > > > > > > > >> would > > > > > > > > > > > >> > > > assume > > > > > > > > > > > >> > > > > > > the 2 > > > > > > > > > > > >> > > > > > > > > can't be added as is to compute the > > e2e > > > > > > latency. > > > > > > > > > > Maybe I > > > > > > > > > > > >> am > > > > > > > > > > > >> > > > missing > > > > > > > > > > > >> > > > > > > > > something here. Could you plz > clarify > > > > this. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Thanks! > > > > > > > > > > > >> > > > > > > > > Sagar. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 30, 2022 at 8:43 PM > Jorge > > > > > Esteban > > > > > > > > Quilcate > > > > > > > > > > > >> Otoya > > > > > > > > > > > >> > < > > > > > > > > > > > >> > > > > > > > > quilcate.jo...@gmail.com> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Hi all, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > I'd like to start a discussion > > thread > > > on > > > > > > > > KIP-864: > > > > > > > > > > Add > > > > > > > > > > > >> > > > End-To-End > > > > > > > > > > > >> > > > > > > > Latency > > > > > > > > > > > >> > > > > > > > > > Metrics to Connectors. > > > > > > > > > > > >> > > > > > > > > > This KIP aims to improve the > metrics > > > > > > available > > > > > > > > on > > > > > > > > > > > Source > > > > > > > > > > > >> > and > > > > > > > > > > > >> > > > Sink > > > > > > > > > > > >> > > > > > > > > > Connectors to measure end-to-end > > > > latency, > > > > > > > > including > > > > > > > > > > > >> source > > > > > > > > > > > >> > > and > > > > > > > > > > > >> > > > > sink > > > > > > > > > > > >> > > > > > > > > record > > > > > > > > > > > >> > > > > > > > > > conversion time, and sink record > e2e > > > > > latency > > > > > > > > > > (similar > > > > > > > > > > > to > > > > > > > > > > > >> > > > KIP-613 > > > > > > > > > > > >> > > > > > for > > > > > > > > > > > >> > > > > > > > > > Streams). > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > The KIP is here: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Please take a look and let me know > > > what > > > > > you > > > > > > > > think. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Cheers, > > > > > > > > > > > >> > > > > > > > > > Jorge. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >