Re: Reworking the Rescale API

Chesnay Schepler Thu, 02 Feb 2023 08:03:44 -0800

> If I understand correctly, the adaptive scheduler currently does afull job restart. Is there any work planned to enable in-place rescalingwith the adaptive scheduler?

Nothing concrete. Sure, it's on a wishlist, but it'd require significantchanges to how the runtime works.Rescaling stateful operators requires keygroups to be redistributed,you'd need to be able to change task edges dynamically, roll-back to acheckpoint without restarting tasks, ...


It's less of a scheduler thing actually.

An earlier step to that would be to allow recovery from an error withoutrestarting all tasks, which would benefit all schedulers.

But again bit of a moonshot.

> How well has the adaptive scheduler been tested in production? If weare intending to use it for rescale operations, I'm a bit concernedthose jobs might show different behavior due to the scheduling than jobsstarted with the default scheduler.


I don't think we got a lot of feedback so far.

Outside of the limitations listed on the elastic scaling page (which Ibelieve we'll address in due time) I'm not aware of any problems.

We haven't run into any issues internally.

On 02/02/2023 12:44, Maximilian Michels wrote:

+1 on improving the scheduler docs.

They never shared a base class since day 1. Are you maybe mixing up the 
AdaptiveScheduler and AdaptiveBatchScheduler?

@Chesnay: Indeed, I had mixed this up. DefaultScheduler and
AdaptiveScheduler only share the SchedulerNG interface while the
DefaultScheduler and the AdaptiveBatchScheduler share a subset of the
code. Too many schedulers :)

Thanks for clarifying the current and the intended feature set of the
adaptive scheduler!

How well has the adaptive scheduler been tested in production? If we
are intending to use it for rescale operations, I'm a bit concerned
those jobs might show different behavior due to the scheduling than
jobs started with the default scheduler.

If I understand correctly, the adaptive scheduler currently does a
full job restart. Is there any work planned to enable in-place
rescaling with the adaptive scheduler?

@max:
   - when user repartition, we still need to restart the job, can we try to
   do this part of the work internally instead of externally, as
   *@konstantin* said only trigger rescaling when the checkpoint or
   retain-checkpoint is completed operations to minimize reprocessing

@ConradJam: I'm not sure I understand your question. Do you mean when
the partition strategy changes between operators? That shouldn't be
the case for Rescale (except maybe converting ForwardPartitioner to
RescalePartitioner). A more advanced rescale API could allow user
control over this but for now I think it would only support adjusting
parallelism of vertices.

-Max

On Thu, Feb 2, 2023 at 6:44 AM weijie guo <[email protected]> wrote:

Hi David,

Sorry I'm late to join discuss.

+1 for having a more structure doc about scheduler ecosystem and I can help to 
fill in the details about batch part.

Best regards,

Weijie



David Morávek <[email protected]> 于2023年2月1日周三 22:38写道：

It makes sense to give the whole "scheduler ecosystem," not just the
adaptive scheduler, a little bit more structure in the docs. We already
have 4 different schedulers (Default, Adaptive, AdaptiveBatch,
AdaptiveBatchSpeculative), and it becomes quite confusing since the details
are scattered around the docs. Maybe having a "Job Schedulers" subpage, the
same way as we have for "Resource Providers" could do the trick.

I should be able to fill in the details about the streaming ones, but I
will probably need some help with the batch ones.

As for the first FLIP, it's already prepared and we should be able to
publish it until Friday.

Best,
D.


On Wed, Feb 1, 2023 at 9:56 AM Gyula Fóra <[email protected]> wrote:

Chesnay, David:

Thank you guys for the extra information. We were clearly missing some
context here around the scheduler related efforts and the currently
available feature set.

As for the concrete suggestions regarding the docs.

1. If the adaptive scheduler provides a significantly different feature set
from the default scheduler we could have its own smaller doc page detailing
the differences and why people should switch to it for streaming. This will
also help us when we are making the transition and change the default
behaviour.
2. We could still have an elastic scaling page that links to the adaptive
scheduler (and vice versa) that focuses on elastic scaling + the Kubernetes
operator autoscaler for a complete picture on elastic scaling options +
detailing the limitations of the different approaches.

This way the Adaptive Scheduler docs will be decoupled from elastic scaling
and will result in a better understanding for the users (it sure would have
helped us here, and we are on the more advanced user side :))

What do you think?
Gyula

On Sat, Jan 28, 2023 at 4:20 AM ConradJam <[email protected]> wrote:

Sorry I'm late to join discuss, I've gleaned a lot of useful information
from you guys

*@max*

    - when user repartition, we still need to restart the job, can we try

to

    do this part of the work internally instead of externally, as
    *@konstantin* said only trigger rescaling when the checkpoint or
    retain-checkpoint is completed operations to minimize reprocessing

*@konstantin*

    - I think you mentioned that 2 FLIPs are being drafted which I

consider

    to be the condition to achieve the *@max* goal, I would love to join
    this discussion and contribute it. I've tried a native implementation

of

    this part myself, if I can help the community that's the best I can do

*@chesnay*

    - The docs section is confusion/misconceptions confusing like *@gyula
*say,
    I'll see if I can fix it


*About Rescale Api*

   Some limitations and differences between *default* and *reactive mode*
were
discussed earlier, and *@chesnay* explained some of their limitations and
behaviors, essentially they are two different things. I agree that when
reactive mode is ready, it should be used as the *reactive mode* for the
default *stream processing* job.
   As for the *[1] **Rescale API*, as we know now it seems to be

unusable, I

believe the goal of this api is to be able to do fast reparallelism. I
would like to wait until the discussion is over and the 2 draft FILPs
mentioned earlier are completed. It is not too late to make another
decision on whether to modify the *[2] **Rescale Rest API *to support for
parallelism modification of job vertices


    1.
*

https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/

https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/

    *
    2.
*

https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling

https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling

    *


Best～



Maximilian Michels <[email protected]> 于2023年1月24日周二 01:08写道：

Hi,

The current rescale API appears to be a work in progress. A couple

years

ago, we disabled access to the API [1].

I'm looking into this problem as part of working on autoscaling [2]

where

we currently require a full restart of the job to apply the parallelism
overrides. This adds additional delay and comes with the caveat that we
don't know whether sufficient resources are available prior to

executing

the scaling decision. We obviously do not want to get stuck due to a

lack

of resources. So a rescale API would have to ensure enough resources

are

available prior to restarting the job.

I've created an issue here:
https://issues.apache.org/jira/browse/FLINK-30773

Any comments or interest in working on this?

-Max

[1] https://issues.apache.org/jira/browse/FLINK-12312
[2]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling


--
Best

ConradJam

Re: Reworking the Rescale API

Reply via email to