Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Yash Mayya Thu, 08 Sep 2022 04:26:13 -0700

Thanks, I think it makes sense to define these metrics at a DEBUG recording
level.


On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> On Thu, 8 Sept 2022 at 05:55, Yash Mayya <yash.ma...@gmail.com> wrote:
>
> > Hi Jorge,
> >
> > Thanks for the changes. With regard to having per batch vs per record
> > metrics, the additional overhead I was referring to wasn't about whether
> or
> > not we would need to iterate over all the records in a batch. I was
> > referring to the potential additional overhead caused by the higher
> volume
> > of calls to Sensor::record on the sensors for the new metrics (as
> compared
> > to the existing batch only metrics), especially for high throughput
> > connectors where batch sizes could be large. I guess we may want to do
> some
> > sort of performance testing and get concrete numbers to verify whether
> this
> > is a valid concern or not?
> >
>
> 6.1. Got it, thanks for clarifying. I guess there could be a benchmark test
> of the `Sensor::record` to get an idea of the performance impact.
> Regardless, the fact that these are single-record metrics compared to
> existing batch-only could be explicitly defined by setting these metrics at
> a DEBUG or TRACE metric recording level, leaving the existing at INFO
> level.
> wdyt?
>
>
> >
> > Thanks,
> > Yash
> >
> > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi Sagar and Yash,
> > >
> > > > the way it's defined in
> > > https://kafka.apache.org/documentation/#connect_monitoring for the
> > metrics
> > >
> > > 4.1. Got it. Add it to the KIP.
> > >
> > > > The only thing I would argue is do we need sink-record-latency-min?
> > Maybe
> > > we
> > > > could remove this min metric as well and make all of the 3 e2e
> metrics
> > > > consistent
> > >
> > > 4.2 I see. Will remove it from the KIP.
> > >
> > > > Probably users can track the metrics at their end to
> > > > figure that out. Do you think that makes sense?
> > >
> > > 4.3. Yes, agree. With these new metrics it should be easier for users
> to
> > > track this.
> > >
> > > > I think it makes sense to not have a min metric for either to remain
> > > > consistent with the existing put-batch and poll-batch metrics
> > >
> > > 5.1. Got it. Same as 4.2
> > >
> > > > Another naming related suggestion I had was with the
> > > > "convert-time" metrics - we should probably include transformations
> in
> > > the
> > > > name since SMTs could definitely be attributable to a sizable chunk
> of
> > > the
> > > > latency depending on the specific transformation chain.
> > >
> > > 5.2. Make sense. I'm proposing to add
> `sink-record-convert-transform...`
> > > and `source-record-transform-convert...` to represent correctly the
> order
> > > of operations.
> > >
> > > > it seems like both source and sink tasks only record metrics at a
> > "batch"
> > > > level, not on an individual record level. I think it might be
> > additional
> > > > overhead if we want to record these new metrics all at the record
> > level?
> > >
> > > 5.3. I considered at the beginning to implement all metrics at the
> batch
> > > level, but given how the framework process records, I fallback to the
> > > proposed approach:
> > > - Sink Task:
> > >   - `WorkerSinkTask#convertMessages(msgs)` already iterates over
> records,
> > > so there is no additional overhead to capture record latency per
> record.
> > >     -
> > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L490-L514
> > >   - `WorkerSinkTask#convertAndTransformRecord(record)` actually happens
> > > individually. Measuring this operation per batch would include
> processing
> > > that is not strictly part of "convert and transform"
> > >     -
> > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L518
> > > - Source Task:
> > >   - `AbstractWorkerSourceTask#sendRecords` iterates over a batch and
> > > applies transforms and convert record individually as well:
> > >     -
> > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L389-L390
> > >
> > > > This might require some additional changes -
> > > > for instance, with the "sink-record-latency" metric, we might only
> want
> > > to
> > > > have a "max" metric since "avg" would require recording a value on
> the
> > > > sensor for each record (whereas we can get a "max" by only recording
> a
> > > > metric value for the oldest record in each batch).
> > >
> > > 5.4. Recording record-latency per batch may not be as useful as there
> is
> > no
> > > guarantee that the oldest record will be representative of the batch.
> > >
> > > On Sat, 3 Sept 2022 at 16:02, Yash Mayya <yash.ma...@gmail.com> wrote:
> > >
> > > > Hi Jorge and Sagar,
> > > >
> > > > I think it makes sense to not have a min metric for either to remain
> > > > consistent with the existing put-batch and poll-batch metrics (it
> > doesn't
> > > > seem particularly useful either anyway). Also, the new
> > > > "sink-record-latency" metric name looks fine to me, thanks for making
> > the
> > > > changes! Another naming related suggestion I had was with the
> > > > "convert-time" metrics - we should probably include transformations
> in
> > > the
> > > > name since SMTs could definitely be attributable to a sizable chunk
> of
> > > the
> > > > latency depending on the specific transformation chain.
> > > >
> > > > I have one high level question with respect to implementation -
> > > currently,
> > > > it seems like both source and sink tasks only record metrics at a
> > "batch"
> > > > level, not on an individual record level. I think it might be
> > additional
> > > > overhead if we want to record these new metrics all at the record
> > level?
> > > > Could we instead make all of these new metrics for batches of records
> > > > rather than individual records in order to remain consistent with the
> > > > existing task level metrics? This might require some additional
> > changes -
> > > > for instance, with the "sink-record-latency" metric, we might only
> want
> > > to
> > > > have a "max" metric since "avg" would require recording a value on
> the
> > > > sensor for each record (whereas we can get a "max" by only recording
> a
> > > > metric value for the oldest record in each batch).
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > > On Fri, Sep 2, 2022 at 3:16 PM Sagar <sagarmeansoc...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks for the changes.
> > > > >
> > > > > Regarding the metrics, I meant something like this:
> > > > >
> > > >
> > >
> >
> kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
> > > > >
> > > > > the way it's defined in
> > > > > https://kafka.apache.org/documentation/#connect_monitoring for the
> > > > > metrics.
> > > > >
> > > > > I see what you mean by the 3 metrics and how it can be interpreted.
> > The
> > > > > only thing I would argue is do we need sink-record-latency-min?
> Maybe
> > > we
> > > > > could remove this min metric as well and make all of the 3 e2e
> > metrics
> > > > > consistent(since put-batch also doesn't expose a min which makes
> > sense
> > > to
> > > > > me). I think this is in contrast to what Yash pointed out above so
> I
> > > > would
> > > > > like to hear his thoughts as well.
> > > > >
> > > > > The other point Yash mentioned about the slightly flawed definition
> > of
> > > > e2e
> > > > > is also true in a sense. But I have a feeling that's one the
> records
> > > are
> > > > > polled by the connector tasks, it would be difficult to track the
> > final
> > > > leg
> > > > > via the framework. Probably users can track the metrics at their
> end
> > to
> > > > > figure that out. Do you think that makes sense?
> > > > >
> > > > > Thanks!
> > > > > Sagar.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 1, 2022 at 11:40 PM Jorge Esteban Quilcate Otoya <
> > > > > quilcate.jo...@gmail.com> wrote:
> > > > >
> > > > > > Hi Sagar and Yash,
> > > > > >
> > > > > > Thanks for your feedback!
> > > > > >
> > > > > > > 1) I am assuming the new metrics would be task level metric.
> > > > > >
> > > > > > 1.1 Yes, it will be a task level metric, implemented on the
> > > > > > Worker[Source/Sink]Task.
> > > > > >
> > > > > > > Could you specify the way it's done for other sink/source
> > > connector?
> > > > > >
> > > > > > 1.2. Not sure what do you mean by this. Could you elaborate a bit
> > > more?
> > > > > >
> > > > > > > 2. I am slightly confused about the e2e latency metric...
> > > > > >
> > > > > > 2.1. Yes, I see. I was trying to bring a similar concept as in
> > > Streams
> > > > > with
> > > > > > KIP-613, though the e2e concept may not be translatable.
> > > > > > We could keep it as `sink-record-latency` to avoid conflating
> > > > concepts. A
> > > > > > similar metric naming was proposed in KIP-489 but at the consumer
> > > > level —
> > > > > > though it seems dormant for a couple of years.
> > > > > >
> > > > > > > However, the put-batch time measures the
> > > > > > > time to put a batch of records to external sink. So, I would
> > assume
> > > > > the 2
> > > > > > > can't be added as is to compute the e2e latency. Maybe I am
> > missing
> > > > > > > something here. Could you plz clarify this.
> > > > > >
> > > > > > 2.2. Yes, agree. Not necessarily added, but with the 3 latencies
> > > (poll,
> > > > > > convert, putBatch) will be clearer where the bottleneck may be,
> and
> > > > > > represent the internal processing.
> > > > > >
> > > > > > > however, as per the KIP it looks like it will be
> > > > > > > the latency between when the record was written to Kafka and
> when
> > > the
> > > > > > > record is returned by a sink task's consumer's poll?
> > > > > >
> > > > > > 3.1. Agree. 2.1. could help to clarify this.
> > > > > >
> > > > > > > One more thing - I was wondering
> > > > > > > if there's a particular reason for having a min metric for e2e
> > > > latency
> > > > > > but
> > > > > > > not for convert time?
> > > > > >
> > > > > > 3.2. Was following KIP-613 for e2e which seems useful to compare
> > with
> > > > > Max a
> > > > > > get an idea of the window of results, though current latencies in
> > > > > Connector
> > > > > > do not include Min, and that's why I haven't added it for convert
> > > > > latency.
> > > > > > Do you think it make sense to extend latency metrics with Min?
> > > > > >
> > > > > > KIP is updated to clarify some of these changes.
> > > > > >
> > > > > > Many thanks,
> > > > > > Jorge.
> > > > > >
> > > > > > On Thu, 1 Sept 2022 at 18:11, Yash Mayya <yash.ma...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Jorge,
> > > > > > >
> > > > > > > Thanks for the KIP! I have the same confusion with the
> > e2e-latency
> > > > > > metrics
> > > > > > > as Sagar above. "e2e" would seem to indicate the latency
> between
> > > when
> > > > > the
> > > > > > > record was written to Kafka and when the record was written to
> > the
> > > > sink
> > > > > > > system by the connector - however, as per the KIP it looks like
> > it
> > > > will
> > > > > > be
> > > > > > > the latency between when the record was written to Kafka and
> when
> > > the
> > > > > > > record is returned by a sink task's consumer's poll? I think
> that
> > > > > metric
> > > > > > > will be a little confusing to interpret. One more thing - I was
> > > > > wondering
> > > > > > > if there's a particular reason for having a min metric for e2e
> > > > latency
> > > > > > but
> > > > > > > not for convert time?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yash
> > > > > > >
> > > > > > > On Thu, Sep 1, 2022 at 8:59 PM Sagar <
> sagarmeansoc...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jorge,
> > > > > > > >
> > > > > > > > Thanks for the KIP. It looks like a very good addition. I
> > skimmed
> > > > > > through
> > > > > > > > once and had a couple of questions =>
> > > > > > > >
> > > > > > > > 1) I am assuming the new metrics would be task level metric.
> > > Could
> > > > > you
> > > > > > > > specify the way it's done for other sink/source connector?
> > > > > > > > 2) I am slightly confused about the e2e latency metric. Let's
> > > > > consider
> > > > > > > the
> > > > > > > > sink connector metric. If I look at the way it's supposed to
> be
> > > > > > > calculated,
> > > > > > > > i.e the difference between the record timestamp and the wall
> > > clock
> > > > > > time,
> > > > > > > it
> > > > > > > > looks like a per record metric. However, the put-batch time
> > > > measures
> > > > > > the
> > > > > > > > time to put a batch of records to external sink. So, I would
> > > assume
> > > > > > the 2
> > > > > > > > can't be added as is to compute the e2e latency. Maybe I am
> > > missing
> > > > > > > > something here. Could you plz clarify this.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Sagar.
> > > > > > > >
> > > > > > > > On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate Otoya
> <
> > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I'd like to start a discussion thread on KIP-864: Add
> > > End-To-End
> > > > > > > Latency
> > > > > > > > > Metrics to Connectors.
> > > > > > > > > This KIP aims to improve the metrics available on Source
> and
> > > Sink
> > > > > > > > > Connectors to measure end-to-end latency, including source
> > and
> > > > sink
> > > > > > > > record
> > > > > > > > > conversion time, and sink record e2e latency (similar to
> > > KIP-613
> > > > > for
> > > > > > > > > Streams).
> > > > > > > > >
> > > > > > > > > The KIP is here:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > > > > > > > >
> > > > > > > > > Please take a look and let me know what you think.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Jorge.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Reply via email to