Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

zhijiang Thu, 24 Jan 2019 20:04:43 -0800

Hi Bo,

Your mentioned problems can be summaried into two issues:

1. Failover strategy should consider whether the upstream produced partition is 
still available when the downstream fails. If the produced partition is 
available, then only downstream region needs to restarted, otherwise the 
upstream region should also be restarted to re-produce the partition data.
2. The lifecycle of partition: Currently once the partition data is transfered 
via network completely, the partition and view would be released from producer 
side, no matter whether the data is actually processed by consumer or not. Even 
the TaskManager would be released earier when the partition data is not 
transfered yet.

Both issues are already considered in my proposed pluggable shuffle manager 
architecutre which would introduce the ShuffleMaster componenet to manage 
partition globally on JobManager side, then it is natural to solve the above 
problems based on this architecuture. You can refer to the flip [1] if 
interested.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager

Best,
Zhijiang
------------------------------------------------------------------
From:Stephan Ewen <se...@apache.org>
Send Time:2019年1月24日(星期四) 22:17
To:dev <dev@flink.apache.org>; Kurt Young <k...@apache.org>
Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable 
to support fine grained recovery

The SpillableSubpartition can also be used during the execution of bounded
DataStreams programs. I think this is largely independent from deprecating
the DataSet API.

I am wondering if this particular issue is one that has been addressed in
the Blink code already (we are looking to merge much of that functionality)
- because the proposed extension is actually necessary for proper batch
fault tolerance (independent of the DataSet or Query Processor stack).

I am adding Kurt to this thread - maybe he help us find that out.

On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <pi...@da-platform.com>
wrote:

> Hi,
>
> I’m not sure how much effort we will be willing to invest in the existing
> batch stack. We are currently focusing on the support of bounded
> DataStreams (already done in Blink and will be merged to Flink soon) and
> unifing batch & stream under DataStream API.
>
> Piotrek
>
> > On 23 Jan 2019, at 04:45, Bo WANG <wbeaglewatc...@gmail.com> wrote:
> >
> > Hi all,
> >
> > When running the batch WordCount example,  I configured the job execution
> > mode
> > as BATCH_FORCED, and failover-strategy as region, I manually injected
> some
> > errors to let the execution fail in different phases. In some cases, the
> > job could
> > recovery from failover and became succeed, but in some cases, the job
> > retried
> > several times and failed.
> >
> > Example:
> > - If the failure occurred before task read data, e.g., failed before
> > invokable.invoke() in Task.java, failover could succeed.
> > - If the failure occurred after task having read data, failover did not
> > work.
> >
> > Problem diagnose:
> > Running the example described before, each ExecutionVertex is defined as
> > a restart region, and the ResultPartitionType between executions is
> > BLOCKING.
> > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > write/read
> > shuffle data, and data blocks are described as BufferConsumers stored in
> a
> > list
> > called buffers, when task requires input data from
> > SpillableSubpartitionView,
> > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > after having read data, some BufferConsumers have already released.
> > Although tasks retried, the input data is incomplete.
> >
> > Fix Proposal:
> > - BufferConsumer should not be removed from buffers until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition should not be released until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > each of which is corresponding to a ExecutionAttempt.
> >
> > Best,
> > Bo
>
>

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

Reply via email to