Re: [DISCUSS] FLIP-271: Autoscaling

Maximilian Michels Tue, 15 Nov 2022 07:46:05 -0800

Of course! Let me know if your concerns are addressed. The wiki page has
been updated.


>It will be great to add this in the FLIP so that reviewers can understand
how the source parallelisms are computed and how the algorithm works
end-to-end.

I've updated the FLIP page to add more details on how the backlog-based
scaling works (2).

>These metrics and configs are public API and need to be stable across
minor versions, could we document them before finalizing the FLIP?

Metrics and config changes are not strictly part of the public API but
Gyula has added a section.

-Max

On Tue, Nov 15, 2022 at 3:01 PM Dong Lin <[email protected]> wrote:

> Hi Maximilian,
>
> It seems that the following comments from the previous discussions have not
> been addressed yet. Any chance we can have them addressed before starting
> the voting thread?
>
> Thanks,
> Dong
>
> On Mon, Nov 7, 2022 at 2:33 AM Gyula Fóra <[email protected]> wrote:
>
> > Hi Dong!
> >
> > Let me try to answer the questions :)
> >
> > 1 : busyTimeMsPerSecond is not specific for CPU, it measures the time
> > spent in the main record processing loop for an operator if I
> > understand correctly. This includes IO operations too.
> >
> > 2: We should add this to the FLIP I agree. It would be a Duration config
> > with the expected catch up time after rescaling (let's say 5 minutes). It
> > could be computed based on the current data rate and the calculated max
> > processing rate after the rescale.
> >
>
> It will be great to add this in the FLIP so that reviewers can understand
> how the source parallelisms are computed and how the algorithm works
> end-to-end.
>
>
> > 3: In the current proposal we don't have per operator configs. Target
> > utilization would apply to all operators uniformly.
> >
> > 4: It should be configurable, yes.
> >
>
> Since this config is a public API, could we update the FLIP accordingly to
> provide this config?
>
>
> >
> > 5,6: The names haven't been finalized but I think these are minor
> details.
> > We could add concrete names to the FLIP :)
> >
>
> These metrics and configs are public API and need to be stable across minor
> versions, could we document them before finalizing the FLIP?
>
>
> >
> > Cheers,
> > Gyula
> >
> >
> > On Sun, Nov 6, 2022 at 5:19 PM Dong Lin <[email protected]> wrote:
> >
> >> Hi Max,
> >>
> >> Thank you for the proposal. The proposal tackles a very important issue
> >> for Flink users and the design looks promising overall!
> >>
> >> I have some questions to better understand the proposed public
> interfaces
> >> and the algorithm.
> >>
> >> 1) The proposal seems to assume that the operator's busyTimeMsPerSecond
> >> could reach 1 sec. I believe this is mostly true for cpu-bound
> operators.
> >> Could you confirm that this can also be true for io-bound operators
> such as
> >> sinks? For example, suppose a Kafka Sink subtask has reached I/O
> bottleneck
> >> when flushing data out to the Kafka clusters, will busyTimeMsPerSecond
> >> reach 1 sec?
> >>
> >> 2) It is said that "users can configure a maximum time to fully process
> >> the backlog". The configuration section does not seem to provide this
> >> config. Could you specify this? And any chance this proposal can provide
> >> the formula for calculating the new processing rate?
> >>
> >> 3) How are users expected to specify the per-operator configs (e.g.
> >> target utilization)? For example, should users specify it
> programmatically
> >> in a DataStream/Table/SQL API?
> >>
> >> 4) How often will the Flink Kubernetes operator query metrics from
> >> JobManager? Is this configurable?
> >>
> >> 5) Could you specify the config name and default value for the proposed
> >> configs?
> >>
> >> 6) Could you add the name/mbean/type for the proposed metrics?
> >>
> >>
> >> Cheers,
> >> Dong
> >>
> >>
> >>
>

Re: [DISCUSS] FLIP-271: Autoscaling

Reply via email to