Hi Guozhang,

Thanks for the KIP! I think this KIP is a really nice addition to better understand what is going on in a Kafka Streams application.

1.
The metric names "paused-active-tasks" and "paused-standby-tasks" might be a bit confusing since at least active tasks can be paused also outside of restoration.

2.
Why is the type of the metrics "stream-state-metrics"? I would have expected "stream-thread-metrics" as the type.

3.
Isn't the value of the metric "restoring-standby-tasks" simply the number of standby tasks since standby tasks are basically always updating (aka restoring)?

4.
"idle-ratio", "restore-ratio", and "checkpoint-ratio" seem metrics tailored to the upcoming state updater. They do not make much sense with a stream thread. Would it be better to introduce a new metrics level specifically for the state updater?

5.
Personally, I do not like to use the word "restoration" together with standbys since restoration somehow implies that there is an offset for which the active task is considered restored and active processing can start. In other words, restoration is finite. Standby tasks rather update continuously their states. They can be up-to-date or lagging. I see that you could say "restored" instead of "up-to-date" and "not restored" instead of "lagging", but IMO it does not describe well the situation. That is a rather minor point. I just wanted to mention it.

6.
The name "onRestorePaused()" might be confusing since in Kafka Streams users can also pause tasks. What about "onRestoreAborted()" or "onRestoreSuspended"?

Best,
Bruno


On 16.09.22 19:33, Guozhang Wang wrote:
Hello everyone,

I'd like to start a discussion for the following KIP, aiming to improve
Kafka Stream's restoration visibility via new metrics and callback methods:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility


Thanks!
-- Guozhang

Reply via email to