Re: Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-05-26 Thread Zhu Zhu
Hi everyone, Thank you for all the feedback on this FLIP! I will open a vote for it since there is no more concern. Thanks, Zhu Zhu Zhu 于2022年5月11日周三 12:29写道: > > Hi everyone, > > According to the discussion and updates of the blocklist > mechanism[1] (FLIP-224), I have updated FLIP-168 to make

Re: Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-05-10 Thread Zhu Zhu
Hi everyone, According to the discussion and updates of the blocklist mechanism[1] (FLIP-224), I have updated FLIP-168 to make decision on itself to block identified slow nodes. A new configuration is also added to control how long a slow node should be blocked. [1] https://lists.apache.org/threa

Re: Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-04-28 Thread Zhu Zhu
Thank you for all the feedback! @Guowei Ma Here's my thoughts for your questions: >> 1. How to judge whether the Execution Vertex belongs to a slow task. If a slow task fails and gets restarted, it may not be a slow task anymore. Especially given that the nodes of the slow task may have been black

Re: Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-04-28 Thread Guowei Ma
Hi, zhu Many thanks to zhuzhu for initiating the FLIP discussion. Overall I think it's ok, I just have 3 small questions 1. How to judge whether the Execution Vertex belongs to a slow task. The current calculation method is: the current timestamp minus the timestamp of the execution deployment. I

Re: Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-04-28 Thread Jiangang Liu
+1 for the feature. Mang Zhang 于2022年4月28日周四 11:36写道: > Hi zhu: > > > This sounds like a great job! Thanks for your great job. > In our company, there are already some jobs using Flink Batch, > but everyone knows that the offline cluster has a lot more load than > the online cluster,

Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-04-26 Thread Zhu Zhu
Hi everyone, More and more users are running their batch jobs on Flink nowadays. One major problem they encounter is slow tasks running on hot/bad nodes, resulting in very long and uncontrollable execution time of batch jobs. This problem is a pain or even unacceptable in production. Many users ha

Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2021-12-12 Thread 刘建刚
Any progress on the feature? We have the same requirement in our company. Since the soft and hard environment can be complex, it is normal to see a slow task which determines the execution time of the flink job. 于2021年6月20日周日 22:35写道: > Hi everyone, > > I would like to kick off a discussion on s