+1 (binding)

On Fri, Jul 14, 2023 at 11:59 AM Prabhu Joseph <prabhujose.ga...@gmail.com>
wrote:

> *+1 (non-binding)*
>
> Thanks for working on this. We have seen good improvement during the cool
> down period with this feature.
> Below are details on the test results from one of our clusters:
>
> On a scale-out operation, 8 new nodes were added one by one with a gap of
> ~30 seconds. There were 8 restarts within 4 minutes with the default
> behaviour,
> whereas only one with this feature (cooldown period of 4 minutes).
>
> The number of records processed by the job with this feature during the
> restart window is higher (2909764), whereas it is only 1323960 with the
> default
> behaviour due to multiple restarts, where it spends most of the time
> recovering, and also whatever work progressed by the tasks after the last
> successful completed checkpoint is lost.
>
> Metrics Default Adaptive Scheduler Adaptive Scheduler With Cooldown Period
> Remarks
> NumRecordsProcessed 1323960 2909764 1. NumRecordsProcessed metric indicates
> the difference the cool down period brings in. When the job is doing
> multiple restarts, the task spends most of the time recovering, and the
> progress the task made will be lost during the restart.
>
> 2. There is only one restart with Cool Down Period which happened when the
> 8th node got added back.
>
> Job Parallelism 13 -> 20 -> 27 -> 34 -> 41 -> 48 -> 55 → 62 → 69 13 → 69
> NumRestarts 8 1
>
>
>
>
>
>
>
>
> On Wed, Jul 12, 2023 at 8:03 PM Etienne Chauchot <echauc...@apache.org>
> wrote:
>
> > Hi all,
> >
> > I'm going on vacation tonight for 3 weeks.
> >
> > Even if the vote is not finished, as the implementation is rather quick
> > and the design discussion had settled, I preferred I implementing
> > FLIP-322 [1] to allow people to take a look while I'm off.
> >
> > [1] https://github.com/apache/flink/pull/22985
> >
> > Best
> >
> > Etienne
> >
> > Le 12/07/2023 à 09:56, Etienne Chauchot a écrit :
> > >
> > > Hi all,
> > >
> > > Would you mind casting your vote to this second vote thread (opened
> > > after new discussions) so that the subject can move forward ?
> > >
> > > @David, @Chesnay, @Robert you took part to the discussions, can you
> > > please sent your vote ?
> > >
> > > Thank you very much
> > >
> > > Best
> > >
> > > Etienne
> > >
> > > Le 06/07/2023 à 13:02, Etienne Chauchot a écrit :
> > >>
> > >> Hi all,
> > >>
> > >> Thanks for your feedback about the FLIP-322: Cooldown period for
> > >> adaptive scheduler [1].
> > >>
> > >> This FLIP was discussed in [2].
> > >>
> > >> I'd like to start a vote for it. The vote will be open for at least 72
> > >> hours (until July 9th 15:00 GMT) unless there is an objection or
> > >> insufficient votes.
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler
> > >> [2] https://lists.apache.org/thread/qvgxzhbp9rhlsqrybxdy51h05zwxfns6
> > >>
> > >> Best,
> > >>
> > >> Etienne
>

Reply via email to