+1, Thanks Yangyu for proposing this very useful feature. Looking forward
to the design doc.

On Wed, Nov 7, 2018 at 10:15 AM SHI Xiaogang <shixiaoga...@gmail.com> wrote:

> Hi,
>
> +1 for the speculative execution.
>
> It will be more great if it can work well with exisitng checkpointing and
> pipelined execution. That way, we can move a further step towards the
> unification of batch and stream processing.
>
> Regards,
> Xiaogang
>
> Jeff Zhang <zjf...@gmail.com> 于2018年11月7日周三 上午9:40写道:
>
> > +1 for the speculative execution for Flink batch, Speculative execution
> is
> > used in lots of batch execution engine like mr, tez and spark. This would
> > be a great improvement for Flink in batch scenario.
> >
> > Jin Sun <isun...@gmail.com>于2018年11月7日周三 上午8:38写道:
> >
> > > I think this is target for batch at the very beginning, the idea should
> > be
> > > also work for both case, with different algorithm/strategy.
> > >
> > > Ryan, since you are working on this, I will assign FLINK-10644 <
> > > https://issues.apache.org/jira/browse/FLINK-10644> to you.
> > >
> > > Jin
> > >
> > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <trohrm...@apache.org>
> > wrote:
> > > >
> > > > Thanks for starting this discussion Ryan. I'm looking forward to your
> > > > design document about this feature. Quick question: Will it be a
> batch
> > > only
> > > > feature? If no, then it needs to take checkpointing into account as
> > well.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang...@aliyun.com
> > > .invalid>
> > > > wrote:
> > > >
> > > >> Thanks yangyu for launching this discussion.
> > > >>
> > > >> I really like this proposal. We ever found this scene frequently
> that
> > > some
> > > >> long tail tasks to delay the total batch job execution time in
> > > production.
> > > >> We also have some thoughts for bringing this mechanism. Looking
> > forward
> > > to
> > > >> your detail design doc, then we can discussion further.
> > > >>
> > > >> Best,
> > > >> Zhijiang
> > > >> ------------------------------------------------------------------
> > > >> 发件人:Tao Yangyu <ryantao...@gmail.com>
> > > >> 发送时间:2018年11月6日(星期二) 11:01
> > > >> 收件人:dev <dev@flink.apache.org>
> > > >> 主 题:[DISCUSS] Task speculative execution for Flink batch
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> We propose task speculative execution for Flink batch in this
> message
> > as
> > > >> follows.
> > > >>
> > > >> In the batch mode, the job is usually divided into multiple parallel
> > > tasks
> > > >> executed cross many nodes in the cluster. It is common to encounter
> > the
> > > >> performance degradation on some nodes due to hardware problems or
> > > accident
> > > >> I/O busy and high CPU load. This kind of degradation can probably
> > cause
> > > the
> > > >> running tasks on the node to be quite slow that is so called long
> tail
> > > >> tasks. Although the long tail tasks will not fail, they can severely
> > > affect
> > > >> the total job running time. Flink task scheduler does not take this
> > long
> > > >> tail problem into account currently.
> > > >>
> > > >>
> > > >>
> > > >> Here we propose the speculative execution strategy to handle the
> > > problem.
> > > >> The basic idea is to run a copy of task on another node when the
> > > original
> > > >> task is identified to be long tail. In more details, the speculative
> > > task
> > > >> will be triggered when the scheduler detects that the data
> processing
> > > >> throughput of a task is much slower than others. The speculative
> task
> > is
> > > >> executed in parallel with the original one and share the same
> failure
> > > retry
> > > >> mechanism. Once either task complete, the scheduler admits its
> output
> > as
> > > >> the final result and cancel the other running one. The preliminary
> > > >> experiments has demonstrated the effectiveness.
> > > >>
> > > >>
> > > >> The detailed design doc will be ready soon.  Your reviews and
> comments
> > > will
> > > >> be much appreciated.
> > > >>
> > > >>
> > > >> Thanks!
> > > >>
> > > >> Ryan
> > > >>
> > > >>
> > >
> > >
> >
>

Reply via email to