Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Yash Mayya Sat, 03 Sep 2022 08:02:39 -0700

Hi Jorge and Sagar,

I think it makes sense to not have a min metric for either to remain
consistent with the existing put-batch and poll-batch metrics (it doesn't
seem particularly useful either anyway). Also, the new
"sink-record-latency" metric name looks fine to me, thanks for making the
changes! Another naming related suggestion I had was with the
"convert-time" metrics - we should probably include transformations in the
name since SMTs could definitely be attributable to a sizable chunk of the
latency depending on the specific transformation chain.


I have one high level question with respect to implementation - currently,
it seems like both source and sink tasks only record metrics at a "batch"
level, not on an individual record level. I think it might be additional
overhead if we want to record these new metrics all at the record level?
Could we instead make all of these new metrics for batches of records
rather than individual records in order to remain consistent with the
existing task level metrics? This might require some additional changes -
for instance, with the "sink-record-latency" metric, we might only want to
have a "max" metric since "avg" would require recording a value on the
sensor for each record (whereas we can get a "max" by only recording a
metric value for the oldest record in each batch).

Thanks,
Yash

On Fri, Sep 2, 2022 at 3:16 PM Sagar <sagarmeansoc...@gmail.com> wrote:

> Hi Jorge,
>
> Thanks for the changes.
>
> Regarding the metrics, I meant something like this:
> kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
>
> the way it's defined in
> https://kafka.apache.org/documentation/#connect_monitoring for the
> metrics.
>
> I see what you mean by the 3 metrics and how it can be interpreted. The
> only thing I would argue is do we need sink-record-latency-min? Maybe we
> could remove this min metric as well and make all of the 3 e2e metrics
> consistent(since put-batch also doesn't expose a min which makes sense to
> me). I think this is in contrast to what Yash pointed out above so I would
> like to hear his thoughts as well.
>
> The other point Yash mentioned about the slightly flawed definition of e2e
> is also true in a sense. But I have a feeling that's one the records are
> polled by the connector tasks, it would be difficult to track the final leg
> via the framework. Probably users can track the metrics at their end to
> figure that out. Do you think that makes sense?
>
> Thanks!
> Sagar.
>
>
>
>
> On Thu, Sep 1, 2022 at 11:40 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Sagar and Yash,
> >
> > Thanks for your feedback!
> >
> > > 1) I am assuming the new metrics would be task level metric.
> >
> > 1.1 Yes, it will be a task level metric, implemented on the
> > Worker[Source/Sink]Task.
> >
> > > Could you specify the way it's done for other sink/source connector?
> >
> > 1.2. Not sure what do you mean by this. Could you elaborate a bit more?
> >
> > > 2. I am slightly confused about the e2e latency metric...
> >
> > 2.1. Yes, I see. I was trying to bring a similar concept as in Streams
> with
> > KIP-613, though the e2e concept may not be translatable.
> > We could keep it as `sink-record-latency` to avoid conflating concepts. A
> > similar metric naming was proposed in KIP-489 but at the consumer level —
> > though it seems dormant for a couple of years.
> >
> > > However, the put-batch time measures the
> > > time to put a batch of records to external sink. So, I would assume
> the 2
> > > can't be added as is to compute the e2e latency. Maybe I am missing
> > > something here. Could you plz clarify this.
> >
> > 2.2. Yes, agree. Not necessarily added, but with the 3 latencies (poll,
> > convert, putBatch) will be clearer where the bottleneck may be, and
> > represent the internal processing.
> >
> > > however, as per the KIP it looks like it will be
> > > the latency between when the record was written to Kafka and when the
> > > record is returned by a sink task's consumer's poll?
> >
> > 3.1. Agree. 2.1. could help to clarify this.
> >
> > > One more thing - I was wondering
> > > if there's a particular reason for having a min metric for e2e latency
> > but
> > > not for convert time?
> >
> > 3.2. Was following KIP-613 for e2e which seems useful to compare with
> Max a
> > get an idea of the window of results, though current latencies in
> Connector
> > do not include Min, and that's why I haven't added it for convert
> latency.
> > Do you think it make sense to extend latency metrics with Min?
> >
> > KIP is updated to clarify some of these changes.
> >
> > Many thanks,
> > Jorge.
> >
> > On Thu, 1 Sept 2022 at 18:11, Yash Mayya <yash.ma...@gmail.com> wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the KIP! I have the same confusion with the e2e-latency
> > metrics
> > > as Sagar above. "e2e" would seem to indicate the latency between when
> the
> > > record was written to Kafka and when the record was written to the sink
> > > system by the connector - however, as per the KIP it looks like it will
> > be
> > > the latency between when the record was written to Kafka and when the
> > > record is returned by a sink task's consumer's poll? I think that
> metric
> > > will be a little confusing to interpret. One more thing - I was
> wondering
> > > if there's a particular reason for having a min metric for e2e latency
> > but
> > > not for convert time?
> > >
> > > Thanks,
> > > Yash
> > >
> > > On Thu, Sep 1, 2022 at 8:59 PM Sagar <sagarmeansoc...@gmail.com>
> wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > Thanks for the KIP. It looks like a very good addition. I skimmed
> > through
> > > > once and had a couple of questions =>
> > > >
> > > > 1) I am assuming the new metrics would be task level metric. Could
> you
> > > > specify the way it's done for other sink/source connector?
> > > > 2) I am slightly confused about the e2e latency metric. Let's
> consider
> > > the
> > > > sink connector metric. If I look at the way it's supposed to be
> > > calculated,
> > > > i.e the difference between the record timestamp and the wall clock
> > time,
> > > it
> > > > looks like a per record metric. However, the put-batch time measures
> > the
> > > > time to put a batch of records to external sink. So, I would assume
> > the 2
> > > > can't be added as is to compute the e2e latency. Maybe I am missing
> > > > something here. Could you plz clarify this.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jo...@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion thread on KIP-864: Add End-To-End
> > > Latency
> > > > > Metrics to Connectors.
> > > > > This KIP aims to improve the metrics available on Source and Sink
> > > > > Connectors to measure end-to-end latency, including source and sink
> > > > record
> > > > > conversion time, and sink record e2e latency (similar to KIP-613
> for
> > > > > Streams).
> > > > >
> > > > > The KIP is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > > > >
> > > > > Please take a look and let me know what you think.
> > > > >
> > > > > Cheers,
> > > > > Jorge.
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

Reply via email to