JIN SUN created FLINK-10644:
-------------------------------

             Summary: Batch Job: Speculative execution
                 Key: FLINK-10644
                 URL: https://issues.apache.org/jira/browse/FLINK-10644
             Project: Flink
          Issue Type: New Feature
          Components: JobManager
            Reporter: JIN SUN
            Assignee: JIN SUN
             Fix For: 1.8.0


Strugglers/outlier are tasks that run slower than most of the all tasks in a 
Batch Job, this somehow impact job latency, as pretty much this straggler will 
be in the critical path of the job and become as the bottleneck. 

Tasks may be slow for various reasons, including hardware degradation, or 
software mis-configuration, or noise neighboring. It's hard for JM to predict 
the runtime. 

To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark 
has *_speculative execution_*. Speculative execution is a health-check 
procedure that checks for tasks to be speculated, i.e. running slower in a 
ExecutionJobVertex than the median of all successfully completed tasks in that 
EJV, Such slow tasks will be re-submitted to another TM. It will not stop the 
slow tasks, but run a new copy in parallel. And will kill the others if one of 
them complete. 

This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be 
append later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to