yes, it is really a critical problem for large batch job because the unexpected failure is a common case. And we are already focusing on realizing the ideas mentioned in FLIP1, wish to contirbute to flink in months. Best, Zhijiang------------------------------------------------------------------发件人:Si-li Liu <unix...@gmail.com>发送时间:2017年2月17日(星期五) 11:22收件人:user <user@flink.apache.org>主 题:Re: Flink batch processing fault tolerance Hi, It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again. 2017-02-17 10:56 GMT+08:00 Renjie Liu <liurenjie2...@gmail.com>: https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures This FLIP may help. On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <anton_solo...@epam.com> wrote: Hi Aljoscha, Could you share your plans of resolving it? Best,Anton From: Aljoscha Krettek [mailto:aljos...@apache.org]
Sent: Thursday, February 16, 2017 2:48 PM To: user@flink.apache.org Subject: Re: Flink batch processing fault tolerance Hi,yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API. Best,Aljoscha On Wed, 15 Feb 2017 at 09:28 Renjie Liu <liurenjie2...@gmail.com> wrote:Hi, all: I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs? -- Liu, RenjieSoftware Engineer, MVAD-- Liu, RenjieSoftware Engineer, MVAD -- Best regards Sili Liu