Hey, I'm changing my vote to binding now :)

On Mon, Jan 23, 2023 at 9:38 PM Matthias J. Sax <mj...@apache.org> wrote:

> Thanks Guozhang. Couple of clarifications and follow up questions.
>
>
> >> I'm not aware of a discussion to rename the call name to "suspend" for
> >> KIP-834. Could you point me to the reference?
>
> My commend was not about KIP-834, but about this KIP. You originally
> proposed to call the new call-back `onRestorePause` but to avoid
> confusion it was improved and renamed to `onRestoreSuspended`.
>
>
> > The only one so far that I feel is probably better, is
> > "state-update-ratio". If folks feel this one is better than
> > "restore-ratio" I'm happy to update.
>
> Could we actually report two metric, one for the restore phase
> (restore-ration), and one for steady state ([standby-]update-ratio)?
>
> I could like with `state-update-ratio` if we want to have a single
> metric for both, but splitting them sound useful to me.
>
>
> > (4) `restore-call-rate`
>
> Maybe we can clarify in the description a little bit. I agree it's very
> low level but if you think it could be useful to debugging, I have no
> objection.
>
>
> > The rationale behind it is the general principle in metrics design
> > that "Kafka would provide the lowest necessary metrics levels, and
> > users can do the roll-ups however they want".
>
> That's fair, but it seems to be a rather important metric, and having it
> only at DEBUG level seems not ideal? Could we make it INFO level, even
> if it's a task level (ie, apply an exception to the rule).
>
>
>
> -Matthias
>
>
>
> On 1/19/23 2:35 PM, Guozhang Wang wrote:
> > Hello Matthias,
> >
> > Thanks for the feedback. I was on vacation for a while. Pardon for the
> > late replies. Please see them inline below
> >
> > On Thu, Dec 1, 2022 at 11:23 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >>
> >> Seems I am late to the party... Great KIP. Couple of questions from my
> side:
> >>
> >> (1) What is the purpose of `standby-updating-tasks`? It seems to be the
> >> same as the number of assigned standby task? Not sure how useful it
> >> would be?
> >>
> > In general, yes, it is the number of assigned standby tasks --- there
> > will be transit times when the assigned standby tasks are not yet
> > being updated but it would not last long --- but we do not yet have a
> > direct gauge to expose this before, and users have to infer this from
> > other indirect metrics.
> >
> >>
> >>
> >> (2) `active-paused-tasks` / `standby-paused-tasks` -- what does "paused"
> >> exactly mean? There was a discussion about renaming the callback method
> >> from pause to suspended. So should this be called `suspended`, too? And
> >> if yes, how is it useful for users?
> >>
> > Pausing here refers to "KIP-834: Pause / Resume KafkaStreams
> > Topologies" (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832
> ).
> > When a topology is paused, all its tasks including standbys will be
> > paused too.
> >
> > I'm not aware of a discussion to rename the call name to "suspend" for
> > KIP-834. Could you point me to the reference?
> >
> >>
> >>
> >> (3) `restore-ratio`: the description says
> >>
> >>> The fraction of time the thread spent on restoring active or standby
> tasks
> >>
> >> I find the term "restoring" does only apply to active tasks, but not to
> >> standbys. Can we reword this?
> >>
> > Yeah I have been discussing this with others in the community a bit as
> > well, but so far I have not been convinced of a better name than it.
> > Some other alternatives being discussed but not win everyone's love is
> > "restore-or-update-ratio", "process-ratio" (for the restore thread
> > that means restoring or updating), and "io-ratio".
> >
> > The only one so far that I feel is probably better, is
> > "state-update-ratio". If folks feel this one is better than
> > "restore-ratio" I'm happy to update.
> >
> >>
> >> (4) `restore-call-rate`: not sure what you exactly mean by "restore
> calls"?
> >>
> > This is similar to the "io-calls-rate" in the selector classes, i.e.
> > the number of "restore" function calls made. It's argurably a very
> > low-level metrics but I included it since it could be useful in some
> > debugging scenarios.
> >
> >>
> >> (5) `restore-remaining-records-total` -- why is this a task metric?
> >> Seems we could roll it up into a thread metric that we report at INFO
> >> level (we could still have per-task DEBUG level metric for it in
> addition).
> >>
> > The rationale behind it is the general principle in metrics design
> > that "Kafka would provide the lowest necessary metrics levels, and
> > users can do the roll-ups however they want".
> >
> >>
> >> (6) What about "warmup tasks"? Internally, we treat them as standbys,
> >> but it seems it's hard for users to reason about it in the scale-out
> >> warm-up case. Would it be helpful (and possible) to report "warmup
> >> progress" explicitly?
> >>
> > At the restore thread level, we cannot differentiate standby tasks
> > from warmup tasks since the latter is created exactly just like the
> > former. But I do agree this is an issue for visibility that worth
> > addressing, I think another KIP would be needed to first consider
> > distinguishing these two at the class level.
> >
> >>
> >> -Matthias
> >>
> >>
> >> On 11/1/22 2:44 AM, Lucas Brutschy wrote:
> >>> We need this!
> >>>
> >>> + 1 non binding
> >>>
> >>> Cheers,
> >>> Lucas
> >>>
> >>> On Tue, Nov 1, 2022 at 10:01 AM Bruno Cadonna <cado...@apache.org>
> wrote:
> >>>>
> >>>> Guozhang,
> >>>>
> >>>> Thanks for the KIP!
> >>>>
> >>>> +1 (binding)
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On 25.10.22 22:07, Walker Carlson wrote:
> >>>>> +1 non binding
> >>>>>
> >>>>> Thanks for the kip!
> >>>>>
> >>>>> On Thu, Oct 20, 2022 at 10:25 PM John Roesler <vvcep...@apache.org>
> wrote:
> >>>>>
> >>>>>> Thanks for the KIP, Guozhang!
> >>>>>>
> >>>>>> I'm +1 (binding)
> >>>>>>
> >>>>>> -John
> >>>>>>
> >>>>>> On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:
> >>>>>>> Can't wait!
> >>>>>>> +1 (non-binding)
> >>>>>>>
> >>>>>>> On Wed, 12 Oct 2022, 18:02 Guozhang Wang, <
> guozhang.wang...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello all,
> >>>>>>>>
> >>>>>>>> I'd like to start a vote for the following KIP, aiming to improve
> Kafka
> >>>>>>>> Stream's restoration visibility via new metrics and callback
> methods:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>
> >>>>>
>

Reply via email to