Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2022-02-25 Thread Guozhang Wang
Thanks Rohan, I've reviewed the new PR and had a question regarding whether we should have the new metric in ms or ns, maybe we can first discuss about that before we finalize the KIP? Guozhang On Fri, Feb 25, 2022 at 5:48 AM Bruno Cadonna wrote: > Hi Rohan, > > Thank you for the heads up! >

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2022-02-25 Thread Bruno Cadonna
Hi Rohan, Thank you for the heads up! Yes, please update the KIP and send this message also to the VOTE thread of the KIP. Best, Bruno On 25.02.22 04:01, Rohan Desai wrote: Hello, I discovered a bug in the design of this metric. The bug is documented here: https://github.com/apache/kafka/p

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2022-02-24 Thread Rohan Desai
Hello, I discovered a bug in the design of this metric. The bug is documented here: https://github.com/apache/kafka/pull/11805. We need to include time the producer spends waiting on topic metadata into the total blocked time. But to do this we would need to add a new producer metric that tracks t

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-08-09 Thread Bruno Cadonna
Hi, With "group", I actually meant the "type" tag. A thread-level metrics in Streams has the following tag "type=stream-thread-metrics" which we also call "group" in the code. I realized now that I mentioned "type" and "group" in my previous e-mail which are synonyms for the same concept. Fo

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-27 Thread Sophie Blee-Goldman
Thanks for the clarifications, that all makes sense. I'm ready to vote on the KIP, but can you just update the KIP first to address Bruno's feedback? Ie just fix the tags and fill in the missing fields. For example it sounds like these would be thread-level metrics. You should be able to figure o

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-22 Thread Bruno Cadonna
Hi Rohan, Thank you for the KIP! I agree that the KIP is well-motivated. What is not very clear is the metadata like type, group, and tags of the metrics. For example, there is not application-id tag in Streams and there is also no producer-id tag. The clients, i.e., producer, admin, consume

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-22 Thread Rohan Desai
re sophie: The intent here was to include all blocked time (not just `RUNNING`). The caller can window the total blocked time themselves, and that can be compared with a timeseries of the state to understand the ratio in different states. I'll update the KIP to include `committed`. The admin API c

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-21 Thread Sophie Blee-Goldman
Hey Rohan, The current metrics proposed in the KIP all LGTM, but if the goal is to include *all* time spent blocking on any client API then there are a few that might need to be added to this list. For example Consumer#committed is called in StreamTask to get the offsets during initialization, and

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-20 Thread Rohan Desai
> I remember now that we moved the round-trip PID's txn completion logic into init-transaction and commit/abort-transaction. So I think we'd count time as in StreamsProducer#initTransaction as well (admittedly it is in most cases a one-time thing). Makes sense - I'll update the KIP On Tue, Jul 20

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-20 Thread Rohan Desai
> I had a question - it seems like from the descriptionsof `txn-commit-time-total` and `offset-commit-time-total` that they measure similar processes for ALOS and EOS, but only `txn-commit-time-total` is included in `blocked-time-total`. Why isn't `offset-commit-time-total` also included? I've upd

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-20 Thread Guozhang Wang
Thanks Rohan. I remember now that we moved the round-trip PID's txn completion logic into init-transaction and commit/abort-transaction. So I think we'd count time as in StreamsProducer#initTransaction as well (admittedly it is in most cases a one-time thing). Other than that, I do not have any f

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-20 Thread Rohan Desai
> Similarly, I think "txn-commit-time-total" and "offset-commit-time-total" may better be inside producer and consumer clients respectively. I agree for offset-commit-time-total. For txn-commit-time-total I'm proposing we measure `StreamsProducer.commitTransaction`, which wraps multiple producer c

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-20 Thread Leah Thomas
Thanks for the KIP Rohan. I had a question - it seems like from the descriptions of `txn-commit-time-total` and `offset-commit-time-total` that they measure similar processes for ALOS and EOS, but only `txn-commit-time-total` is included in `blocked-time-total`. Why isn't `offset-commit-time-total

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-19 Thread Rohan Desai
Thanks for the review Guozhang! responding to your feedback inline: > 1) I agree that the current ratio metrics is just "snapshot in point", and more flexible metrics that would allow reporters to calculate based on window intervals are better. However, the current mechanism of the proposed metric

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Rohan Desai
Hello All, I'd like to start a discussion on the KIP linked above which proposes some metrics that we would find useful to help measure whether a Kafka Streams application is saturated. The motivation section in the KIP goes into some more detail on why we think this is a useful addition to the me

Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Guozhang Wang
Hey Rohan, Thanks for putting up the KIP. Maybe you can also briefly talk about the context in the email message as well (they are very well explained inside the KIP doc though). I have a few meta and detailed thoughts after reading it. 1) I agree that the current ratio metrics is just "snapshot

[DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Rohan Desai
https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams