Thanks Rohan.

I remember now that we moved the round-trip PID's txn completion logic into
init-transaction and commit/abort-transaction. So I think we'd count time
as in StreamsProducer#initTransaction as well (admittedly it is in most
cases a one-time thing).

Other than that, I do not have any further comments.

Guozhang

On Tue, Jul 20, 2021 at 11:59 AM Rohan Desai <desai.p.ro...@gmail.com>
wrote:

> > Similarly, I think "txn-commit-time-total" and
> "offset-commit-time-total" may better be inside producer and consumer
> clients respectively.
>
> I agree for offset-commit-time-total. For txn-commit-time-total I'm
> proposing we measure `StreamsProducer.commitTransaction`, which wraps
> multiple producer calls (sendOffsets, commitTransaction)
>
> > > For "txn-commit-time-total" specifically, besides producer.commitTxn.
> other txn-related calls may also be blocking, including
> producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
> later in the doc, but did not include it as a separate metric, and
> similarly, should we have a `txn-abort-time-total` as well? If yes, could
> you update the KIP page accordingly.
>
> `beginTransaction` is not blocking - I meant to remove that from that doc.
> I'll add something for abort.
>
> On Mon, Jul 19, 2021 at 11:55 PM Rohan Desai <desai.p.ro...@gmail.com>
> wrote:
>
> > Thanks for the review Guozhang! responding to your feedback inline:
> >
> > > 1) I agree that the current ratio metrics is just "snapshot in point",
> > and
> > more flexible metrics that would allow reporters to calculate based on
> > window intervals are better. However, the current mechanism of the
> proposed
> > metrics assumes the thread->clients mapping as of today, where each
> thread
> > would own exclusively one main consumer, restore consumer, producer and
> an
> > admin client. But this mapping may be subject to change in the future.
> Have
> > you thought about how this metric can be extended when, e.g. the embedded
> > clients and stream threads are de-coupled?
> >
> > Of course this depends on how exactly we refactor the runtime - assuming
> > that we plan to factor out consumers into an "I/O" layer that is
> > responsible for receiving records and enqueuing them to be processed by
> > processing threads, then I think it should be reasonable to count the
> time
> > we spend blocked on this internal queue(s) as blocked. The main concern
> > there to me is that the I/O layer would be doing something expensive like
> > decompression that shouldn't be counted as "blocked". But if that really
> is
> > so expensive that it starts to throw off our ratios then it's probably
> > indicative of a larger problem that the "i/o layer" is a bottleneck and
> it
> > would be worth refactoring so that decompression (or insert other
> expensive
> > thing here) can also be done on the processing threads.
> >
> > > 2) [This and all below are minor comments] The "flush-time-total" may
> > better be a producer client metric, as "flush-wait-time-total", than a
> > streams metric, though the streams-level "total-blocked" can still
> leverage
> > it. Similarly, I think "txn-commit-time-total" and
> > "offset-commit-time-total" may better be inside producer and consumer
> > clients respectively.
> >
> > Good call - I'll update the KIP
> >
> > > 3) The doc was not very clear on how "thread-start-time" would be
> needed
> > when calculating streams utilization along with total-blocked time, could
> > you elaborate a bit more in the KIP?
> >
> > Yes, will do.
> >
> > > For "txn-commit-time-total" specifically, besides producer.commitTxn.
> > other txn-related calls may also be blocking, including
> > producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
> > later in the doc, but did not include it as a separate metric, and
> > similarly, should we have a `txn-abort-time-total` as well? If yes, could
> > you update the KIP page accordingly.
> >
> > Ack.
> >
> > On Mon, Jul 12, 2021 at 11:29 PM Rohan Desai <desai.p.ro...@gmail.com>
> > wrote:
> >
> >> Hello All,
> >>
> >> I'd like to start a discussion on the KIP linked above which proposes
> >> some metrics that we would find useful to help measure whether a Kafka
> >> Streams application is saturated. The motivation section in the KIP goes
> >> into some more detail on why we think this is a useful addition to the
> >> metrics already implemented. Thanks in advance for your feedback!
> >>
> >> Best Regards,
> >>
> >> Rohan
> >>
> >> On Mon, Jul 12, 2021 at 12:00 PM Rohan Desai <desai.p.ro...@gmail.com>
> >> wrote:
> >>
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
> >>>
> >>
>


-- 
-- Guozhang

Reply via email to