> If I understand correctly, the adaptive scheduler currently does a
full job restart. Is there any work planned to enable in-place rescaling
with the adaptive scheduler?
Nothing concrete. Sure, it's on a wishlist, but it'd require significant
changes to how the runtime works.
Rescaling stateful operators requires keygroups to be redistributed,
you'd need to be able to change task edges dynamically, roll-back to a
checkpoint without restarting tasks, ...
It's less of a scheduler thing actually.
An earlier step to that would be to allow recovery from an error without
restarting all tasks, which would benefit all schedulers.
But again bit of a moonshot.
> How well has the adaptive scheduler been tested in production? If we
are intending to use it for rescale operations, I'm a bit concerned
those jobs might show different behavior due to the scheduling than jobs
started with the default scheduler.
I don't think we got a lot of feedback so far.
Outside of the limitations listed on the elastic scaling page (which I
believe we'll address in due time) I'm not aware of any problems.
We haven't run into any issues internally.
On 02/02/2023 12:44, Maximilian Michels wrote:
+1 on improving the scheduler docs.
They never shared a base class since day 1. Are you maybe mixing up the
AdaptiveScheduler and AdaptiveBatchScheduler?
@Chesnay: Indeed, I had mixed this up. DefaultScheduler and
AdaptiveScheduler only share the SchedulerNG interface while the
DefaultScheduler and the AdaptiveBatchScheduler share a subset of the
code. Too many schedulers :)
Thanks for clarifying the current and the intended feature set of the
adaptive scheduler!
How well has the adaptive scheduler been tested in production? If we
are intending to use it for rescale operations, I'm a bit concerned
those jobs might show different behavior due to the scheduling than
jobs started with the default scheduler.
If I understand correctly, the adaptive scheduler currently does a
full job restart. Is there any work planned to enable in-place
rescaling with the adaptive scheduler?
@max:
- when user repartition, we still need to restart the job, can we try to
do this part of the work internally instead of externally, as
*@konstantin* said only trigger rescaling when the checkpoint or
retain-checkpoint is completed operations to minimize reprocessing
@ConradJam: I'm not sure I understand your question. Do you mean when
the partition strategy changes between operators? That shouldn't be
the case for Rescale (except maybe converting ForwardPartitioner to
RescalePartitioner). A more advanced rescale API could allow user
control over this but for now I think it would only support adjusting
parallelism of vertices.
-Max
On Thu, Feb 2, 2023 at 6:44 AM weijie guo <guoweijieres...@gmail.com> wrote:
Hi David,
Sorry I'm late to join discuss.
+1 for having a more structure doc about scheduler ecosystem and I can help to
fill in the details about batch part.
Best regards,
Weijie
David Morávek <d...@apache.org> 于2023年2月1日周三 22:38写道:
It makes sense to give the whole "scheduler ecosystem," not just the
adaptive scheduler, a little bit more structure in the docs. We already
have 4 different schedulers (Default, Adaptive, AdaptiveBatch,
AdaptiveBatchSpeculative), and it becomes quite confusing since the details
are scattered around the docs. Maybe having a "Job Schedulers" subpage, the
same way as we have for "Resource Providers" could do the trick.
I should be able to fill in the details about the streaming ones, but I
will probably need some help with the batch ones.
As for the first FLIP, it's already prepared and we should be able to
publish it until Friday.
Best,
D.
On Wed, Feb 1, 2023 at 9:56 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
Chesnay, David:
Thank you guys for the extra information. We were clearly missing some
context here around the scheduler related efforts and the currently
available feature set.
As for the concrete suggestions regarding the docs.
1. If the adaptive scheduler provides a significantly different feature set
from the default scheduler we could have its own smaller doc page detailing
the differences and why people should switch to it for streaming. This will
also help us when we are making the transition and change the default
behaviour.
2. We could still have an elastic scaling page that links to the adaptive
scheduler (and vice versa) that focuses on elastic scaling + the Kubernetes
operator autoscaler for a complete picture on elastic scaling options +
detailing the limitations of the different approaches.
This way the Adaptive Scheduler docs will be decoupled from elastic scaling
and will result in a better understanding for the users (it sure would have
helped us here, and we are on the more advanced user side :))
What do you think?
Gyula
On Sat, Jan 28, 2023 at 4:20 AM ConradJam <jam.gz...@gmail.com> wrote:
Sorry I'm late to join discuss, I've gleaned a lot of useful information
from you guys
*@max*
- when user repartition, we still need to restart the job, can we try
to
do this part of the work internally instead of externally, as
*@konstantin* said only trigger rescaling when the checkpoint or
retain-checkpoint is completed operations to minimize reprocessing
*@konstantin*
- I think you mentioned that 2 FLIPs are being drafted which I
consider
to be the condition to achieve the *@max* goal, I would love to join
this discussion and contribute it. I've tried a native implementation
of
this part myself, if I can help the community that's the best I can do
*@chesnay*
- The docs section is confusion/misconceptions confusing like *@gyula
*say,
I'll see if I can fix it
*About Rescale Api*
Some limitations and differences between *default* and *reactive mode*
were
discussed earlier, and *@chesnay* explained some of their limitations and
behaviors, essentially they are two different things. I agree that when
reactive mode is ready, it should be used as the *reactive mode* for the
default *stream processing* job.
As for the *[1] **Rescale API*, as we know now it seems to be
unusable, I
believe the goal of this api is to be able to do fast reparallelism. I
would like to wait until the discussion is over and the 2 draft FILPs
mentioned earlier are completed. It is not too late to make another
decision on whether to modify the *[2] **Rescale Rest API *to support for
parallelism modification of job vertices
1.
*
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/
<
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/
*
2.
*
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling
<
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling
*
Best~
Maximilian Michels <m...@apache.org> 于2023年1月24日周二 01:08写道:
Hi,
The current rescale API appears to be a work in progress. A couple
years
ago, we disabled access to the API [1].
I'm looking into this problem as part of working on autoscaling [2]
where
we currently require a full restart of the job to apply the parallelism
overrides. This adds additional delay and comes with the caveat that we
don't know whether sufficient resources are available prior to
executing
the scaling decision. We obviously do not want to get stuck due to a
lack
of resources. So a rescale API would have to ensure enough resources
are
available prior to restarting the job.
I've created an issue here:
https://issues.apache.org/jira/browse/FLINK-30773
Any comments or interest in working on this?
-Max
[1] https://issues.apache.org/jira/browse/FLINK-12312
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
--
Best
ConradJam