The SpillableSubpartition can also be used during the execution of bounded
DataStreams programs. I think this is largely independent from deprecating
the DataSet API.

I am wondering if this particular issue is one that has been addressed in
the Blink code already (we are looking to merge much of that functionality)
- because the proposed extension is actually necessary for proper batch
fault tolerance (independent of the DataSet or Query Processor stack).

I am adding Kurt to this thread - maybe he help us find that out.

On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <pi...@da-platform.com>
wrote:

> Hi,
>
> I’m not sure how much effort we will be willing to invest in the existing
> batch stack. We are currently focusing on the support of bounded
> DataStreams (already done in Blink and will be merged to Flink soon) and
> unifing batch & stream under DataStream API.
>
> Piotrek
>
> > On 23 Jan 2019, at 04:45, Bo WANG <wbeaglewatc...@gmail.com> wrote:
> >
> > Hi all,
> >
> > When running the batch WordCount example,  I configured the job execution
> > mode
> > as BATCH_FORCED, and failover-strategy as region, I manually injected
> some
> > errors to let the execution fail in different phases. In some cases, the
> > job could
> > recovery from failover and became succeed, but in some cases, the job
> > retried
> > several times and failed.
> >
> > Example:
> > - If the failure occurred before task read data, e.g., failed before
> > invokable.invoke() in Task.java, failover could succeed.
> > - If the failure occurred after task having read data, failover did not
> > work.
> >
> > Problem diagnose:
> > Running the example described before, each ExecutionVertex is defined as
> > a restart region, and the ResultPartitionType between executions is
> > BLOCKING.
> > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > write/read
> > shuffle data, and data blocks are described as BufferConsumers stored in
> a
> > list
> > called buffers, when task requires input data from
> > SpillableSubpartitionView,
> > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > after having read data, some BufferConsumers have already released.
> > Although tasks retried, the input data is incomplete.
> >
> > Fix Proposal:
> > - BufferConsumer should not be removed from buffers until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition should not be released until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > each of which is corresponding to a ExecutionAttempt.
> >
> > Best,
> > Bo
>
>

Reply via email to