Let's make sure that this is on the list of patches me merge from the blink branch...
On Fri, Jan 25, 2019, 07:56 Guowei Ma <guowei....@gmail.com wrote: > Thanks to zhijiang for a detailed explanation. I would do some supplements > Blink has indeed solved this particular problem. This problem can be > identified in Blink and the upstream will be restarted by Blink > thanks > > zhijiang <wangzhijiang...@aliyun.com.invalid> 于2019年1月25日周五 下午12:04写道: > > > Hi Bo, > > > > Your mentioned problems can be summaried into two issues: > > > > 1. Failover strategy should consider whether the upstream produced > > partition is still available when the downstream fails. If the produced > > partition is available, then only downstream region needs to restarted, > > otherwise the upstream region should also be restarted to re-produce the > > partition data. > > 2. The lifecycle of partition: Currently once the partition data is > > transfered via network completely, the partition and view would be > released > > from producer side, no matter whether the data is actually processed by > > consumer or not. Even the TaskManager would be released earier when the > > partition data is not transfered yet. > > > > Both issues are already considered in my proposed pluggable shuffle > > manager architecutre which would introduce the ShuffleMaster componenet > to > > manage partition globally on JobManager side, then it is natural to solve > > the above problems based on this architecuture. You can refer to the flip > > [1] if interested. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager > > > > Best, > > Zhijiang > > ------------------------------------------------------------------ > > From:Stephan Ewen <se...@apache.org> > > Send Time:2019年1月24日(星期四) 22:17 > > To:dev <dev@flink.apache.org>; Kurt Young <k...@apache.org> > > Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly > > readable to support fine grained recovery > > > > The SpillableSubpartition can also be used during the execution of > bounded > > DataStreams programs. I think this is largely independent from > deprecating > > the DataSet API. > > > > I am wondering if this particular issue is one that has been addressed in > > the Blink code already (we are looking to merge much of that > functionality) > > - because the proposed extension is actually necessary for proper batch > > fault tolerance (independent of the DataSet or Query Processor stack). > > > > I am adding Kurt to this thread - maybe he help us find that out. > > > > On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <pi...@da-platform.com> > > wrote: > > > > > Hi, > > > > > > I’m not sure how much effort we will be willing to invest in the > existing > > > batch stack. We are currently focusing on the support of bounded > > > DataStreams (already done in Blink and will be merged to Flink soon) > and > > > unifing batch & stream under DataStream API. > > > > > > Piotrek > > > > > > > On 23 Jan 2019, at 04:45, Bo WANG <wbeaglewatc...@gmail.com> wrote: > > > > > > > > Hi all, > > > > > > > > When running the batch WordCount example, I configured the job > > execution > > > > mode > > > > as BATCH_FORCED, and failover-strategy as region, I manually injected > > > some > > > > errors to let the execution fail in different phases. In some cases, > > the > > > > job could > > > > recovery from failover and became succeed, but in some cases, the job > > > > retried > > > > several times and failed. > > > > > > > > Example: > > > > - If the failure occurred before task read data, e.g., failed before > > > > invokable.invoke() in Task.java, failover could succeed. > > > > - If the failure occurred after task having read data, failover did > not > > > > work. > > > > > > > > Problem diagnose: > > > > Running the example described before, each ExecutionVertex is defined > > as > > > > a restart region, and the ResultPartitionType between executions is > > > > BLOCKING. > > > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to > > > > write/read > > > > shuffle data, and data blocks are described as BufferConsumers stored > > in > > > a > > > > list > > > > called buffers, when task requires input data from > > > > SpillableSubpartitionView, > > > > BufferConsumers are REMOVED from buffers. Thus, when failures > occurred > > > > after having read data, some BufferConsumers have already released. > > > > Although tasks retried, the input data is incomplete. > > > > > > > > Fix Proposal: > > > > - BufferConsumer should not be removed from buffers until the > consumed > > > > ExecutionVertex is terminal. > > > > - SpillableSubpartition should not be released until the consumed > > > > ExecutionVertex is terminal. > > > > - SpillableSubpartition could creates multi > SpillableSubpartitionViews, > > > > each of which is corresponding to a ExecutionAttempt. > > > > > > > > Best, > > > > Bo > > > > > > > > > > >