Thanks for starting this discussion Ryan. I'm looking forward to your design document about this feature. Quick question: Will it be a batch only feature? If no, then it needs to take checkpointing into account as well.
Cheers, Till On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang...@aliyun.com.invalid> wrote: > Thanks yangyu for launching this discussion. > > I really like this proposal. We ever found this scene frequently that some > long tail tasks to delay the total batch job execution time in production. > We also have some thoughts for bringing this mechanism. Looking forward to > your detail design doc, then we can discussion further. > > Best, > Zhijiang > ------------------------------------------------------------------ > 发件人:Tao Yangyu <ryantao...@gmail.com> > 发送时间:2018年11月6日(星期二) 11:01 > 收件人:dev <dev@flink.apache.org> > 主 题:[DISCUSS] Task speculative execution for Flink batch > > Hi everyone, > > We propose task speculative execution for Flink batch in this message as > follows. > > In the batch mode, the job is usually divided into multiple parallel tasks > executed cross many nodes in the cluster. It is common to encounter the > performance degradation on some nodes due to hardware problems or accident > I/O busy and high CPU load. This kind of degradation can probably cause the > running tasks on the node to be quite slow that is so called long tail > tasks. Although the long tail tasks will not fail, they can severely affect > the total job running time. Flink task scheduler does not take this long > tail problem into account currently. > > > > Here we propose the speculative execution strategy to handle the problem. > The basic idea is to run a copy of task on another node when the original > task is identified to be long tail. In more details, the speculative task > will be triggered when the scheduler detects that the data processing > throughput of a task is much slower than others. The speculative task is > executed in parallel with the original one and share the same failure retry > mechanism. Once either task complete, the scheduler admits its output as > the final result and cancel the other running one. The preliminary > experiments has demonstrated the effectiveness. > > > The detailed design doc will be ready soon. Your reviews and comments will > be much appreciated. > > > Thanks! > > Ryan > >