Thanks Yangyu for the nice design doc! One thing to consider is the granularity of speculation. Multiple task may propagate data through pipeline mode. In such case, fixing a single task may not be enough. But you might be able to fix this problem by increasing the granularity of speculation. The traditional case of a single speculative task can be considered as a special case of this.
Xiaowei On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <ryantao...@gmail.com> wrote: > Hi all, > > After refined, the detailed design doc is here: > > https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing > > Your kind reviews and comments are very appreciated and will help so much > the feature to be completed. > > Best, > Ryan > > > Tao Yangyu <ryantao...@gmail.com> 于2018年11月7日周三 下午4:49写道: > > > Thanks so much for your all feedbacks! > > > > Yes, as mentioned above by Jin Sun, the design currently targets batch to > > explore the general framework and basic modules. The strategy could be > also > > applied to stream with some extended code, for example, the result > > commitment. > > > > Jin Sun <isun...@gmail.com> 于2018年11月7日周三 上午8:38写道: > > > >> I think this is target for batch at the very beginning, the idea should > >> be also work for both case, with different algorithm/strategy. > >> > >> Ryan, since you are working on this, I will assign FLINK-10644 < > >> https://issues.apache.org/jira/browse/FLINK-10644> to you. > >> > >> Jin > >> > >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <trohrm...@apache.org> > wrote: > >> > > >> > Thanks for starting this discussion Ryan. I'm looking forward to your > >> > design document about this feature. Quick question: Will it be a batch > >> only > >> > feature? If no, then it needs to take checkpointing into account as > >> well. > >> > > >> > Cheers, > >> > Till > >> > > >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang...@aliyun.com > >> .invalid> > >> > wrote: > >> > > >> >> Thanks yangyu for launching this discussion. > >> >> > >> >> I really like this proposal. We ever found this scene frequently that > >> some > >> >> long tail tasks to delay the total batch job execution time in > >> production. > >> >> We also have some thoughts for bringing this mechanism. Looking > >> forward to > >> >> your detail design doc, then we can discussion further. > >> >> > >> >> Best, > >> >> Zhijiang > >> >> ------------------------------------------------------------------ > >> >> 发件人:Tao Yangyu <ryantao...@gmail.com> > >> >> 发送时间:2018年11月6日(星期二) 11:01 > >> >> 收件人:dev <dev@flink.apache.org> > >> >> 主 题:[DISCUSS] Task speculative execution for Flink batch > >> >> > >> >> Hi everyone, > >> >> > >> >> We propose task speculative execution for Flink batch in this message > >> as > >> >> follows. > >> >> > >> >> In the batch mode, the job is usually divided into multiple parallel > >> tasks > >> >> executed cross many nodes in the cluster. It is common to encounter > the > >> >> performance degradation on some nodes due to hardware problems or > >> accident > >> >> I/O busy and high CPU load. This kind of degradation can probably > >> cause the > >> >> running tasks on the node to be quite slow that is so called long > tail > >> >> tasks. Although the long tail tasks will not fail, they can severely > >> affect > >> >> the total job running time. Flink task scheduler does not take this > >> long > >> >> tail problem into account currently. > >> >> > >> >> > >> >> > >> >> Here we propose the speculative execution strategy to handle the > >> problem. > >> >> The basic idea is to run a copy of task on another node when the > >> original > >> >> task is identified to be long tail. In more details, the speculative > >> task > >> >> will be triggered when the scheduler detects that the data processing > >> >> throughput of a task is much slower than others. The speculative task > >> is > >> >> executed in parallel with the original one and share the same failure > >> retry > >> >> mechanism. Once either task complete, the scheduler admits its output > >> as > >> >> the final result and cancel the other running one. The preliminary > >> >> experiments has demonstrated the effectiveness. > >> >> > >> >> > >> >> The detailed design doc will be ready soon. Your reviews and > comments > >> will > >> >> be much appreciated. > >> >> > >> >> > >> >> Thanks! > >> >> > >> >> Ryan > >> >> > >> >> > >> > >> >