Re: Reworking the Rescale API

Maximilian Michels Thu, 02 Feb 2023 03:45:11 -0800

+1 on improving the scheduler docs.

> They never shared a base class since day 1. Are you maybe mixing up the 
> AdaptiveScheduler and AdaptiveBatchScheduler?


@Chesnay: Indeed, I had mixed this up. DefaultScheduler and
AdaptiveScheduler only share the SchedulerNG interface while the
DefaultScheduler and the AdaptiveBatchScheduler share a subset of the
code. Too many schedulers :)

Thanks for clarifying the current and the intended feature set of the
adaptive scheduler!

How well has the adaptive scheduler been tested in production? If we
are intending to use it for rescale operations, I'm a bit concerned
those jobs might show different behavior due to the scheduling than
jobs started with the default scheduler.

If I understand correctly, the adaptive scheduler currently does a
full job restart. Is there any work planned to enable in-place
rescaling with the adaptive scheduler?

> @max:
>   - when user repartition, we still need to restart the job, can we try to
>   do this part of the work internally instead of externally, as
>   *@konstantin* said only trigger rescaling when the checkpoint or
>   retain-checkpoint is completed operations to minimize reprocessing

@ConradJam: I'm not sure I understand your question. Do you mean when
the partition strategy changes between operators? That shouldn't be
the case for Rescale (except maybe converting ForwardPartitioner to
RescalePartitioner). A more advanced rescale API could allow user
control over this but for now I think it would only support adjusting
parallelism of vertices.

-Max

On Thu, Feb 2, 2023 at 6:44 AM weijie guo <[email protected]> wrote:
>
> Hi David,
>
> Sorry I'm late to join discuss.
>
> +1 for having a more structure doc about scheduler ecosystem and I can help 
> to fill in the details about batch part.
>
> Best regards,
>
> Weijie
>
>
>
> David Morávek <[email protected]> 于2023年2月1日周三 22:38写道：
>>
>> It makes sense to give the whole "scheduler ecosystem," not just the
>> adaptive scheduler, a little bit more structure in the docs. We already
>> have 4 different schedulers (Default, Adaptive, AdaptiveBatch,
>> AdaptiveBatchSpeculative), and it becomes quite confusing since the details
>> are scattered around the docs. Maybe having a "Job Schedulers" subpage, the
>> same way as we have for "Resource Providers" could do the trick.
>>
>> I should be able to fill in the details about the streaming ones, but I
>> will probably need some help with the batch ones.
>>
>> As for the first FLIP, it's already prepared and we should be able to
>> publish it until Friday.
>>
>> Best,
>> D.
>>
>>
>> On Wed, Feb 1, 2023 at 9:56 AM Gyula Fóra <[email protected]> wrote:
>>
>> > Chesnay, David:
>> >
>> > Thank you guys for the extra information. We were clearly missing some
>> > context here around the scheduler related efforts and the currently
>> > available feature set.
>> >
>> > As for the concrete suggestions regarding the docs.
>> >
>> > 1. If the adaptive scheduler provides a significantly different feature set
>> > from the default scheduler we could have its own smaller doc page detailing
>> > the differences and why people should switch to it for streaming. This will
>> > also help us when we are making the transition and change the default
>> > behaviour.
>> > 2. We could still have an elastic scaling page that links to the adaptive
>> > scheduler (and vice versa) that focuses on elastic scaling + the Kubernetes
>> > operator autoscaler for a complete picture on elastic scaling options +
>> > detailing the limitations of the different approaches.
>> >
>> > This way the Adaptive Scheduler docs will be decoupled from elastic scaling
>> > and will result in a better understanding for the users (it sure would have
>> > helped us here, and we are on the more advanced user side :))
>> >
>> > What do you think?
>> > Gyula
>> >
>> > On Sat, Jan 28, 2023 at 4:20 AM ConradJam <[email protected]> wrote:
>> >
>> > > Sorry I'm late to join discuss, I've gleaned a lot of useful information
>> > > from you guys
>> > >
>> > > *@max*
>> > >
>> > >    - when user repartition, we still need to restart the job, can we try
>> > to
>> > >    do this part of the work internally instead of externally, as
>> > >    *@konstantin* said only trigger rescaling when the checkpoint or
>> > >    retain-checkpoint is completed operations to minimize reprocessing
>> > >
>> > > *@konstantin*
>> > >
>> > >    - I think you mentioned that 2 FLIPs are being drafted which I
>> > consider
>> > >    to be the condition to achieve the *@max* goal, I would love to join
>> > >    this discussion and contribute it. I've tried a native implementation
>> > of
>> > >    this part myself, if I can help the community that's the best I can do
>> > >
>> > > *@chesnay*
>> > >
>> > >    - The docs section is confusion/misconceptions confusing like *@gyula
>> > > *say,
>> > >    I'll see if I can fix it
>> > >
>> > >
>> > > *About Rescale Api*
>> > >
>> > >   Some limitations and differences between *default* and *reactive mode*
>> > > were
>> > > discussed earlier, and *@chesnay* explained some of their limitations and
>> > > behaviors, essentially they are two different things. I agree that when
>> > > reactive mode is ready, it should be used as the *reactive mode* for the
>> > > default *stream processing* job.
>> > >   As for the *[1] **Rescale API*, as we know now it seems to be
>> > unusable, I
>> > > believe the goal of this api is to be able to do fast reparallelism. I
>> > > would like to wait until the discussion is over and the 2 draft FILPs
>> > > mentioned earlier are completed. It is not too late to make another
>> > > decision on whether to modify the *[2] **Rescale Rest API *to support for
>> > > parallelism modification of job vertices
>> > >
>> > >
>> > >    1.
>> > > *
>> > >
>> > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/
>> > >    <
>> > >
>> > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/
>> > > >
>> > >    *
>> > >    2.
>> > > *
>> > >
>> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling
>> > >    <
>> > >
>> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling
>> > > >
>> > >    *
>> > >
>> > >
>> > > Best～
>> > >
>> > >
>> > >
>> > > Maximilian Michels <[email protected]> 于2023年1月24日周二 01:08写道：
>> > >
>> > > > Hi,
>> > > >
>> > > > The current rescale API appears to be a work in progress. A couple
>> > years
>> > > > ago, we disabled access to the API [1].
>> > > >
>> > > > I'm looking into this problem as part of working on autoscaling [2]
>> > where
>> > > > we currently require a full restart of the job to apply the parallelism
>> > > > overrides. This adds additional delay and comes with the caveat that we
>> > > > don't know whether sufficient resources are available prior to
>> > executing
>> > > > the scaling decision. We obviously do not want to get stuck due to a
>> > lack
>> > > > of resources. So a rescale API would have to ensure enough resources
>> > are
>> > > > available prior to restarting the job.
>> > > >
>> > > > I've created an issue here:
>> > > > https://issues.apache.org/jira/browse/FLINK-30773
>> > > >
>> > > > Any comments or interest in working on this?
>> > > >
>> > > > -Max
>> > > >
>> > > > [1] https://issues.apache.org/jira/browse/FLINK-12312
>> > > > [2]
>> > > >
>> > >
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
>> > > >
>> > >
>> > >
>> > > --
>> > > Best
>> > >
>> > > ConradJam
>> > >
>> >

Re: Reworking the Rescale API

Reply via email to