Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-06 Thread Etienne Chauchot
Hi, I think we have reached a consensus here. I have updated the FLIP to reflect recent suggestions. I will start a new vote. Best Etienne Le 05/07/2023 à 14:42, Etienne Chauchot a écrit : Hi all, Thanks David for your suggestions. Comments inline. Le 04/07/2023 à 13:35, David Morávek a

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-05 Thread Etienne Chauchot
Hi all, Thanks David for your suggestions. Comments inline. Le 04/07/2023 à 13:35, David Morávek a écrit : waiting 2 min between 2 requirements push seems ok to me This depends on the workload. Would you care if the cost of rescaling were close to zero (which is for most out-of-the-box workloa

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-04 Thread David Morávek
> waiting 2 min between 2 requirements push seems ok to me This depends on the workload. Would you care if the cost of rescaling were close to zero (which is for most out-of-the-box workloads)? In that case, it would be desirable to rescale more frequently, for example, if TMs join incrementally.

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-04 Thread Etienne Chauchot
Hi all, Thanks David for your feedback. My comments are inline Le 04/07/2023 à 09:16, David Morávek a écrit : They will struggle if they add new resources and nothing happens for 5 minutes. The same applies if they start playing with FLIP-291 APIs. I'm wondering if the cooldown makes sense th

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-04 Thread Chesnay Schepler
I think the cooldown still makes sense with FLIP-291 APIs. If you want to fully control the parallelism and rescale timings then you can set the cooldown to zero. If you don't want complete control but just the target parallelism from time to time, then the cooldown within Flink still makes sen

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-04 Thread David Morávek
> They will struggle if they add new resources and nothing happens for 5 minutes. The same applies if they start playing with FLIP-291 APIs. I'm wondering if the cooldown makes sense there since it was the user's deliberate choice to push new requirements. 🤔 Best, D. On Tue, Jul 4, 2023 at 9:11 

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-04 Thread David Morávek
The FLIP reads sane to me. I'm unsure about the default values, though; 5 minutes of wait time between rescales feels rather strict, and we should rethink it to provide a better out-of-the-box experience. I'd focus on newcomers trying AS / Reactive Mode out. They will struggle if they add new reso

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-29 Thread Etienne Chauchot
Thanks Chesnay for your feedback. I have updated the FLIP. I'll start a vote thread. Best Etienne Le 28/06/2023 à 11:49, Chesnay Schepler a écrit : > we should schedule a check that will rescale if min-parallelism-increase is met. Then, what it the use of scaling-interval.max timeout in that

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-28 Thread Chesnay Schepler
> we should schedule a check that will rescale if min-parallelism-increase is met. Then, what it the use of scaling-interval.max timeout in that context ? To force a rescale if min-parallelism-increase is not met (but we could still run above the current parallelism). min-parallelism-increas

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-20 Thread Etienne Chauchot
Hi Chesnay, Thanks for your feedback. Comments inline Le 16/06/2023 à 17:24, Chesnay Schepler a écrit : 1) Options specific to the adaptive scheduler should start with "jobmanager.adaptive-scheduler". ok 2) There isn't /really /a notion of a "scaling event". The scheduler is informed abo

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-16 Thread Chesnay Schepler
1) Options specific to the adaptive scheduler should start with "jobmanager.adaptive-scheduler". 2) There isn't /really /a notion of a "scaling event". The scheduler is informed about new/lost slots and job failures, and reacts accordingly by maybe rescaling the job. (sure, you can think of th

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-16 Thread Etienne Chauchot
Hi Robert, Thanks for your feedback. I don't know the scheduler part well enough yet and I'm taking this ticket as a learning workshop. Regarding your comments: 1. Taking a look at the AdaptiveScheduler class which takes all its configuration from the JobManagerOptions, and also to be consis

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-15 Thread Robert Metzger
Thanks for the FLIP. Some comments: 1. Can you specify the full proposed configuration name? " scaling-cooldown-period" is probably not the full config name? 2. Why is the concept of scaling events and a scaling queue needed? If I remember correctly, the adaptive scheduler will just check how many

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-14 Thread Etienne Chauchot
Hi all, @Yukia,I updated the FLIP to include the aggregation of the staked operations that we discussed below PTAL. Best Etienne Le 13/06/2023 à 16:31, Etienne Chauchot a écrit : Hi Yuxia, Thanks for your feedback. The number of potentially stacked operations depends on the configured le

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-13 Thread Etienne Chauchot
Hi Yuxia, Thanks for your feedback. The number of potentially stacked operations depends on the configured length of the cooldown period. The proposition in the FLIP is to add a minimum delay between 2 scaling operations. But, indeed, an optimization could be to still stack the operations (t

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-06-12 Thread yuxia
Hi, Etienne. Thanks for driving it. I have one question about the mechanism of the cooldown timeout. >From the Proposed Changes part, if a scalling event is received and it falls >during the cooldown period, it'll be stacked to be executed after the period >ends. Also, from the description of FL