The SpillableSubpartition can also be used during the execution of bounded DataStreams programs. I think this is largely independent from deprecating the DataSet API.
I am wondering if this particular issue is one that has been addressed in the Blink code already (we are looking to merge much of that functionality) - because the proposed extension is actually necessary for proper batch fault tolerance (independent of the DataSet or Query Processor stack). I am adding Kurt to this thread - maybe he help us find that out. On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <pi...@da-platform.com> wrote: > Hi, > > I’m not sure how much effort we will be willing to invest in the existing > batch stack. We are currently focusing on the support of bounded > DataStreams (already done in Blink and will be merged to Flink soon) and > unifing batch & stream under DataStream API. > > Piotrek > > > On 23 Jan 2019, at 04:45, Bo WANG <wbeaglewatc...@gmail.com> wrote: > > > > Hi all, > > > > When running the batch WordCount example, I configured the job execution > > mode > > as BATCH_FORCED, and failover-strategy as region, I manually injected > some > > errors to let the execution fail in different phases. In some cases, the > > job could > > recovery from failover and became succeed, but in some cases, the job > > retried > > several times and failed. > > > > Example: > > - If the failure occurred before task read data, e.g., failed before > > invokable.invoke() in Task.java, failover could succeed. > > - If the failure occurred after task having read data, failover did not > > work. > > > > Problem diagnose: > > Running the example described before, each ExecutionVertex is defined as > > a restart region, and the ResultPartitionType between executions is > > BLOCKING. > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to > > write/read > > shuffle data, and data blocks are described as BufferConsumers stored in > a > > list > > called buffers, when task requires input data from > > SpillableSubpartitionView, > > BufferConsumers are REMOVED from buffers. Thus, when failures occurred > > after having read data, some BufferConsumers have already released. > > Although tasks retried, the input data is incomplete. > > > > Fix Proposal: > > - BufferConsumer should not be removed from buffers until the consumed > > ExecutionVertex is terminal. > > - SpillableSubpartition should not be released until the consumed > > ExecutionVertex is terminal. > > - SpillableSubpartition could creates multi SpillableSubpartitionViews, > > each of which is corresponding to a ExecutionAttempt. > > > > Best, > > Bo > >