>> I fully agree that in-place scaling is a much harder problem which is out of
>> the scope for now. My primary concern here is to be able to rescale with
>> upfront reservation of resources before restarting the job, so the job
>> doesn't get stuck in case of resource constraints.
> Not sure I
My primary concern here is to be able to rescale with upfront reservation of
resources before restarting the job, so the job doesn't get stuck in case of
resource constraints.
Not sure I follow. The AS only rescales when it has already acquired the slots
that it needs.
> This is a blocker fr
I fully agree that in-place scaling is a much harder problem which is
out of the scope for now. My primary concern here is to be able to
rescale with upfront reservation of resources before restarting the
job, so the job doesn't get stuck in case of resource constraints.
> Unused slots: If the max
> If I understand correctly, the adaptive scheduler currently does a
full job restart. Is there any work planned to enable in-place rescaling
with the adaptive scheduler?
Nothing concrete. Sure, it's on a wishlist, but it'd require significant
changes to how the runtime works.
Rescaling statef
+1 on improving the scheduler docs.
> They never shared a base class since day 1. Are you maybe mixing up the
> AdaptiveScheduler and AdaptiveBatchScheduler?
@Chesnay: Indeed, I had mixed this up. DefaultScheduler and
AdaptiveScheduler only share the SchedulerNG interface while the
DefaultSchedu
Hi David,
Sorry I'm late to join discuss.
+1 for having a more structure doc about scheduler ecosystem and I can help
to fill in the details about batch part.
Best regards,
Weijie
David Morávek 于2023年2月1日周三 22:38写道:
> It makes sense to give the whole "scheduler ecosystem," not just the
> ad
It makes sense to give the whole "scheduler ecosystem," not just the
adaptive scheduler, a little bit more structure in the docs. We already
have 4 different schedulers (Default, Adaptive, AdaptiveBatch,
AdaptiveBatchSpeculative), and it becomes quite confusing since the details
are scattered aroun
Chesnay, David:
Thank you guys for the extra information. We were clearly missing some
context here around the scheduler related efforts and the currently
available feature set.
As for the concrete suggestions regarding the docs.
1. If the adaptive scheduler provides a significantly different fe
Sorry I'm late to join discuss, I've gleaned a lot of useful information
from you guys
*@max*
- when user repartition, we still need to restart the job, can we try to
do this part of the work internally instead of externally, as
*@konstantin* said only trigger rescaling when the checkpoi
It is certainly true that the messaging around the AS/reactive mode
wasn't good.
In part this happened because initially we only intended to advertise
reactive mode (at the time), and only later figured that the AS on it's
own could already be useful too.
That being said, I'm not sure how to
Also @David Morávek @Chesnay Schepler
It would be great if you could update the respective docs page before
publishing your improvement FLIPS about the adaptive scheduler:
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/
I think many of the confusion/
Thank you @Chesnay Schepler @David Morávek
I think in that case our primary goal should be to make sure that streaming
jobs always use the adaptive scheduler.
Also then it makes perfect sense to build the rescale api improvements for
that specifically.
However we should have a clear plan to mak
>
> The adaptive scheduler only supports streaming jobs. That's the biggest
> limitation that probably won't be fixed anytime soon.
Since FLIP-283 [1] has been accepted, I think this limitation might have
already been addressed to a certain extent. I'd be completely fine with
having a separate sc
The adaptive scheduler only supports streaming jobs. That's the biggest
limitation that probably won't be fixed anytime soon.
The goal was though to make the adaptive scheduler the default for
streaming jobs eventually.
it was very much meant as a better version of the default scheduler for
stre
Hi Gyula,
> can you please explain why the AdaptiveScheduler is not the default
> scheduler?
There are still some smaller bits missing. As far as I know, the missing
parts are:
1) Local recovery (reusing the already downloaded state files after restart
/ rescale)
2) Support for fine-grained re
Chesnay,
Seems like you are suggesting that the Adaptive scheduler does everything
the standard scheduler does and more.
I am clearly not an expert on this topic but can you please explain why the
AdaptiveScheduler is not the default scheduler?
If it can do everything, why do we even have 2 sched
There's the default and reactive mode; nothing else.
At it's core they are the same thing; reactive mode just cranks up the
desired parallelism to infinity and enforces certain assumptions (e.g.,
no active resource management).
The advantage is that the adaptive scheduler can run jobs while no
Thanks for the explanation. If not for the "reactive mode", what is
the advantage of the adaptive scheduler? What other modes does it
support?
>Apart from implementing the same interface the implementations of the adaptive
>and default schedulers are separate.
Last time I looked they implemented
On 26/01/2023 16:18, Maximilian Michels wrote:
I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the available resources.
This is really a misconception that we just have to stomp out.
This statement
Thanks for the replies! I don't mind which scheduler handles the
implementation, as long as autoscaling via the Flink operator works
with it.
I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the availab
If the adaptive scheduler would support all execution modes like Native
Applications, Sessions etc including active resource management then I
think we could use that all the time. I would love to use one scheduler
instead of having 2 options.
Currently however there is a huge gap in functionality
Hi Gyula,
if the adaptive scheduler supported active resource managers, would there
be any other blocker to migrate to it? I don't know much about the
implementation-side here, but conceptually once we have session mode
support and each Jobs in a session clusters declaris their desired
parallelism
Hi Konstantin!
I think the Adaptive Scheduler still will not support Kubernetes Native
integration and can only be used in standalone mode. This means that the
operator needs to manage all resources externally, and compute exactly how
much new slots are needed during rescaling etc.
I think whatev
Hi Max,
it seems to me we are now running in some of the potential duplication of
efforts across the standard and adaptive scheduler that Chesnay had
mentioned on the original ticket. The issue of having to do a full restart
of the Job for rescaling as well as waiting for resources to be available
Hey ConradJam,
Thank you for your thoughtful response. It would be great to start writing
a FLIP for the Rescale API. If you want to take a stab, please go ahead,
I'd be happy to review. I'm sure Gyula or others will also chime in.
I want to answer your question so we are aligned:
● Does scaling
Hello max
Thanks for driving it, I think there is no problem with your previous
suggestion of [1] FLINK-30773. Here I just put forward some supplements and
doubts.I have some suggestions and insights for this
I have experienced the autoscaling of Flink K8S Operator for a part of the
time. The cu
Hi,
The current rescale API appears to be a work in progress. A couple years
ago, we disabled access to the API [1].
I'm looking into this problem as part of working on autoscaling [2] where
we currently require a full restart of the job to apply the parallelism
overrides. This adds additional de
27 matches
Mail list logo