Thanks Rohan. I remember now that we moved the round-trip PID's txn completion logic into init-transaction and commit/abort-transaction. So I think we'd count time as in StreamsProducer#initTransaction as well (admittedly it is in most cases a one-time thing).
Other than that, I do not have any further comments. Guozhang On Tue, Jul 20, 2021 at 11:59 AM Rohan Desai <desai.p.ro...@gmail.com> wrote: > > Similarly, I think "txn-commit-time-total" and > "offset-commit-time-total" may better be inside producer and consumer > clients respectively. > > I agree for offset-commit-time-total. For txn-commit-time-total I'm > proposing we measure `StreamsProducer.commitTransaction`, which wraps > multiple producer calls (sendOffsets, commitTransaction) > > > > For "txn-commit-time-total" specifically, besides producer.commitTxn. > other txn-related calls may also be blocking, including > producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total" > later in the doc, but did not include it as a separate metric, and > similarly, should we have a `txn-abort-time-total` as well? If yes, could > you update the KIP page accordingly. > > `beginTransaction` is not blocking - I meant to remove that from that doc. > I'll add something for abort. > > On Mon, Jul 19, 2021 at 11:55 PM Rohan Desai <desai.p.ro...@gmail.com> > wrote: > > > Thanks for the review Guozhang! responding to your feedback inline: > > > > > 1) I agree that the current ratio metrics is just "snapshot in point", > > and > > more flexible metrics that would allow reporters to calculate based on > > window intervals are better. However, the current mechanism of the > proposed > > metrics assumes the thread->clients mapping as of today, where each > thread > > would own exclusively one main consumer, restore consumer, producer and > an > > admin client. But this mapping may be subject to change in the future. > Have > > you thought about how this metric can be extended when, e.g. the embedded > > clients and stream threads are de-coupled? > > > > Of course this depends on how exactly we refactor the runtime - assuming > > that we plan to factor out consumers into an "I/O" layer that is > > responsible for receiving records and enqueuing them to be processed by > > processing threads, then I think it should be reasonable to count the > time > > we spend blocked on this internal queue(s) as blocked. The main concern > > there to me is that the I/O layer would be doing something expensive like > > decompression that shouldn't be counted as "blocked". But if that really > is > > so expensive that it starts to throw off our ratios then it's probably > > indicative of a larger problem that the "i/o layer" is a bottleneck and > it > > would be worth refactoring so that decompression (or insert other > expensive > > thing here) can also be done on the processing threads. > > > > > 2) [This and all below are minor comments] The "flush-time-total" may > > better be a producer client metric, as "flush-wait-time-total", than a > > streams metric, though the streams-level "total-blocked" can still > leverage > > it. Similarly, I think "txn-commit-time-total" and > > "offset-commit-time-total" may better be inside producer and consumer > > clients respectively. > > > > Good call - I'll update the KIP > > > > > 3) The doc was not very clear on how "thread-start-time" would be > needed > > when calculating streams utilization along with total-blocked time, could > > you elaborate a bit more in the KIP? > > > > Yes, will do. > > > > > For "txn-commit-time-total" specifically, besides producer.commitTxn. > > other txn-related calls may also be blocking, including > > producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total" > > later in the doc, but did not include it as a separate metric, and > > similarly, should we have a `txn-abort-time-total` as well? If yes, could > > you update the KIP page accordingly. > > > > Ack. > > > > On Mon, Jul 12, 2021 at 11:29 PM Rohan Desai <desai.p.ro...@gmail.com> > > wrote: > > > >> Hello All, > >> > >> I'd like to start a discussion on the KIP linked above which proposes > >> some metrics that we would find useful to help measure whether a Kafka > >> Streams application is saturated. The motivation section in the KIP goes > >> into some more detail on why we think this is a useful addition to the > >> metrics already implemented. Thanks in advance for your feedback! > >> > >> Best Regards, > >> > >> Rohan > >> > >> On Mon, Jul 12, 2021 at 12:00 PM Rohan Desai <desai.p.ro...@gmail.com> > >> wrote: > >> > >>> > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams > >>> > >> > -- -- Guozhang