Re: [DISCUSS] FLIP-271: Autoscaling

Dong Lin Thu, 17 Nov 2022 08:07:09 -0800

Hi Gyula,

If I understand correctly, this autopilot proposal is an experimental
feature and its configs/metrics are not mature enough to provide backward
compatibility yet. And the proposal provides high-level ideas of the
algorithm but it is probably too complicated to explain it end-to-end.


On the one hand, I do agree that having an auto-tuning prototype, even if
not mature, is better than nothing for Flink users. On the other hand, I am
concerned that this FLIP seems a bit too experimental, and starting with an
immature design might make it harder for us to reach a production-ready and
generally applicable auto-tuner in the future. And introducing too backward
incompatible changes generally hurts users' trust in the Flink project.

One alternative might be to develop and experiment with this feature in a
non-Flink repo. You can iterate fast without worrying about typically
backward compatibility requirement as required for most Flink public
features. And once the feature is reasonably evaluated and mature enough,
it will be much easier to explain the design and address all the issues
mentioned above. For example, Jingsong implemented a Flink Table Store
prototype
<https://github.com/JingsongLi/flink/tree/table_storage/flink-table> before
proposing FLIP-188 in this thread
<https://lists.apache.org/thread/dlhspjpms007j2ynymsg44fxcx6fm064>.

I don't intend to block your progress. Just my two cents. It will be great
to hear more from other developers (e.g. in the voting thread).

Thanks,
Dong


On Thu, Nov 17, 2022 at 1:24 AM Gyula Fóra <[email protected]> wrote:

> Hi Dong,
>
> Let me address your comments.
>
> Time for scale / backlog processing time derivation:
> We can add some more details to the Flip but at this point the
> implementation is actually much simpler than the algorithm to describe it.
> I would not like to add more equations etc because it just overcomplicates
> something relatively simple in practice.
>
> In a nutshell: Time to recover  == lag / processing-rate-after-scaleup.
> It's fairly easy to see where this is going, but best to see in code.
>
> Using pendingRecords and alternative mechanisms:
> True that the current algorithm relies on pending records to effectively
> compute the target source processing rates and therefore scale sources.
> This is available for Kafka which is by far the most common streaming
> source and is used by the majority of streaming applications currently.
> It would be very easy to add alternative purely utilization based scaling
> to the sources. We can start with the current proposal and add this along
> the way before the first version.
>
> Metrics, Configs and Public API:
> The autoscaler feature is proposed for the Flink Kubernetes Operator which
> does not have the same API/config maturity and thus does not provide the
> same guarantees.
> We currently support backward compatibilty for the CRD itself and not the
> configs or metrics. This does not mean that we do not aim to do so but at
> this stage we still have to clean up the details of the newly added
> components. In practice this means that if we manage to get the metrics /
> configs right at the first try we will keep them and provide compatibility,
> but if we feel that we missed something or we don't need something we can
> still remove it. It's a more pragmatic approach for such a component that
> is likely to evolve than setting everything in stone immediately.
>
> Cheers,
> Gyula
>
>
>
> On Wed, Nov 16, 2022 at 6:07 PM Dong Lin <[email protected]> wrote:
>
> > Thanks for the update! Please see comments inline.
> >
> > On Tue, Nov 15, 2022 at 11:46 PM Maximilian Michels <[email protected]>
> > wrote:
> >
> > > Of course! Let me know if your concerns are addressed. The wiki page
> has
> > > been updated.
> > >
> > > >It will be great to add this in the FLIP so that reviewers can
> > understand
> > > how the source parallelisms are computed and how the algorithm works
> > > end-to-end.
> > >
> > > I've updated the FLIP page to add more details on how the backlog-based
> > > scaling works (2).
> > >
> >
> > The algorithm is much more informative now.  The algorithm currently uses
> > "Estimated time for rescale" to derive new source parallelism. Could we
> > also specify in the FLIP how this value is derived?
> >
> > The algorithm currently uses pendingRecords to derive source parallelism.
> > It is an optional metric and KafkaSource currently reports this metric.
> So
> > it means that only the proposed algorithm currently only works when all
> > sources of the job are KafkaSource, right?
> >
> > This issue considerably limits the applicability of this FLIP. Do you
> think
> > most (if not all) streaming source will report this metric?
> Alternatively,
> > any chance we can have a fallback solution to evaluate the source
> > parallelism based on e.g. cpu or idle ratio for cases where this metric
> is
> > not available?
> >
> >
> > > >These metrics and configs are public API and need to be stable across
> > > minor versions, could we document them before finalizing the FLIP?
> > >
> > > Metrics and config changes are not strictly part of the public API but
> > > Gyula has added a section.
> > >
> >
> > Hmm... if metrics are not public API, then it might happen that we change
> > the mbean path in a minor release and break users' monitoring tool.
> > Similarly, we might change configs in a minor release that break user's
> job
> > behavior. We probably want to avoid these breaking changes in minor
> > releases.
> >
> > It is documented here
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > that
> > "Exposed monitoring information" and "Configuration settings" are public
> > interfaces of the project.
> >
> > Maybe we should also specify the metric here so that users can safely
> setup
> > dashboards and tools to track how the autopilot is working, similar to
> how
> > metrics are documented in FLIP-33
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> > >
> > ?
> >
> >
> > > -Max
> > >
> > > On Tue, Nov 15, 2022 at 3:01 PM Dong Lin <[email protected]> wrote:
> > >
> > > > Hi Maximilian,
> > > >
> > > > It seems that the following comments from the previous discussions
> have
> > > not
> > > > been addressed yet. Any chance we can have them addressed before
> > starting
> > > > the voting thread?
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > > On Mon, Nov 7, 2022 at 2:33 AM Gyula Fóra <[email protected]>
> > wrote:
> > > >
> > > > > Hi Dong!
> > > > >
> > > > > Let me try to answer the questions :)
> > > > >
> > > > > 1 : busyTimeMsPerSecond is not specific for CPU, it measures the
> time
> > > > > spent in the main record processing loop for an operator if I
> > > > > understand correctly. This includes IO operations too.
> > > > >
> > > > > 2: We should add this to the FLIP I agree. It would be a Duration
> > > config
> > > > > with the expected catch up time after rescaling (let's say 5
> > minutes).
> > > It
> > > > > could be computed based on the current data rate and the calculated
> > max
> > > > > processing rate after the rescale.
> > > > >
> > > >
> > > > It will be great to add this in the FLIP so that reviewers can
> > understand
> > > > how the source parallelisms are computed and how the algorithm works
> > > > end-to-end.
> > > >
> > > >
> > > > > 3: In the current proposal we don't have per operator configs.
> Target
> > > > > utilization would apply to all operators uniformly.
> > > > >
> > > > > 4: It should be configurable, yes.
> > > > >
> > > >
> > > > Since this config is a public API, could we update the FLIP
> accordingly
> > > to
> > > > provide this config?
> > > >
> > > >
> > > > >
> > > > > 5,6: The names haven't been finalized but I think these are minor
> > > > details.
> > > > > We could add concrete names to the FLIP :)
> > > > >
> > > >
> > > > These metrics and configs are public API and need to be stable across
> > > minor
> > > > versions, could we document them before finalizing the FLIP?
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > >
> > > > > On Sun, Nov 6, 2022 at 5:19 PM Dong Lin <[email protected]>
> wrote:
> > > > >
> > > > >> Hi Max,
> > > > >>
> > > > >> Thank you for the proposal. The proposal tackles a very important
> > > issue
> > > > >> for Flink users and the design looks promising overall!
> > > > >>
> > > > >> I have some questions to better understand the proposed public
> > > > interfaces
> > > > >> and the algorithm.
> > > > >>
> > > > >> 1) The proposal seems to assume that the operator's
> > > busyTimeMsPerSecond
> > > > >> could reach 1 sec. I believe this is mostly true for cpu-bound
> > > > operators.
> > > > >> Could you confirm that this can also be true for io-bound
> operators
> > > > such as
> > > > >> sinks? For example, suppose a Kafka Sink subtask has reached I/O
> > > > bottleneck
> > > > >> when flushing data out to the Kafka clusters, will
> > busyTimeMsPerSecond
> > > > >> reach 1 sec?
> > > > >>
> > > > >> 2) It is said that "users can configure a maximum time to fully
> > > process
> > > > >> the backlog". The configuration section does not seem to provide
> > this
> > > > >> config. Could you specify this? And any chance this proposal can
> > > provide
> > > > >> the formula for calculating the new processing rate?
> > > > >>
> > > > >> 3) How are users expected to specify the per-operator configs
> (e.g.
> > > > >> target utilization)? For example, should users specify it
> > > > programmatically
> > > > >> in a DataStream/Table/SQL API?
> > > > >>
> > > > >> 4) How often will the Flink Kubernetes operator query metrics from
> > > > >> JobManager? Is this configurable?
> > > > >>
> > > > >> 5) Could you specify the config name and default value for the
> > > proposed
> > > > >> configs?
> > > > >>
> > > > >> 6) Could you add the name/mbean/type for the proposed metrics?
> > > > >>
> > > > >>
> > > > >> Cheers,
> > > > >> Dong
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-271: Autoscaling

Reply via email to