Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

Stephan Ewen Sat, 26 Jan 2019 07:54:22 -0800

Let's make sure that this is on the list of patches me merge from the blink
branch...


On Fri, Jan 25, 2019, 07:56 Guowei Ma <[email protected] wrote:

> Thanks to zhijiang for a detailed explanation. I would do some supplements
> Blink has indeed solved this particular problem. This problem can be
> identified in Blink and the upstream will be restarted by Blink
> thanks
>
> zhijiang <[email protected]> 于2019年1月25日周五 下午12:04写道：
>
> > Hi Bo,
> >
> > Your mentioned problems can be summaried into two issues:
> >
> > 1. Failover strategy should consider whether the upstream produced
> > partition is still available when the downstream fails. If the produced
> > partition is available, then only downstream region needs to restarted,
> > otherwise the upstream region should also be restarted to re-produce the
> > partition data.
> > 2. The lifecycle of partition: Currently once the partition data is
> > transfered via network completely, the partition and view would be
> released
> > from producer side, no matter whether the data is actually processed by
> > consumer or not. Even the TaskManager would be released earier when the
> > partition data is not transfered yet.
> >
> > Both issues are already considered in my proposed pluggable shuffle
> > manager architecutre which would introduce the ShuffleMaster componenet
> to
> > manage partition globally on JobManager side, then it is natural to solve
> > the above problems based on this architecuture. You can refer to the flip
> > [1] if interested.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
> >
> > Best,
> > Zhijiang
> > ------------------------------------------------------------------
> > From:Stephan Ewen <[email protected]>
> > Send Time:2019年1月24日(星期四) 22:17
> > To:dev <[email protected]>; Kurt Young <[email protected]>
> > Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly
> > readable to support fine grained recovery
> >
> > The SpillableSubpartition can also be used during the execution of
> bounded
> > DataStreams programs. I think this is largely independent from
> deprecating
> > the DataSet API.
> >
> > I am wondering if this particular issue is one that has been addressed in
> > the Blink code already (we are looking to merge much of that
> functionality)
> > - because the proposed extension is actually necessary for proper batch
> > fault tolerance (independent of the DataSet or Query Processor stack).
> >
> > I am adding Kurt to this thread - maybe he help us find that out.
> >
> > On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I’m not sure how much effort we will be willing to invest in the
> existing
> > > batch stack. We are currently focusing on the support of bounded
> > > DataStreams (already done in Blink and will be merged to Flink soon)
> and
> > > unifing batch & stream under DataStream API.
> > >
> > > Piotrek
> > >
> > > > On 23 Jan 2019, at 04:45, Bo WANG <[email protected]> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > When running the batch WordCount example,  I configured the job
> > execution
> > > > mode
> > > > as BATCH_FORCED, and failover-strategy as region, I manually injected
> > > some
> > > > errors to let the execution fail in different phases. In some cases,
> > the
> > > > job could
> > > > recovery from failover and became succeed, but in some cases, the job
> > > > retried
> > > > several times and failed.
> > > >
> > > > Example:
> > > > - If the failure occurred before task read data, e.g., failed before
> > > > invokable.invoke() in Task.java, failover could succeed.
> > > > - If the failure occurred after task having read data, failover did
> not
> > > > work.
> > > >
> > > > Problem diagnose:
> > > > Running the example described before, each ExecutionVertex is defined
> > as
> > > > a restart region, and the ResultPartitionType between executions is
> > > > BLOCKING.
> > > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > > > write/read
> > > > shuffle data, and data blocks are described as BufferConsumers stored
> > in
> > > a
> > > > list
> > > > called buffers, when task requires input data from
> > > > SpillableSubpartitionView,
> > > > BufferConsumers are REMOVED from buffers. Thus, when failures
> occurred
> > > > after having read data, some BufferConsumers have already released.
> > > > Although tasks retried, the input data is incomplete.
> > > >
> > > > Fix Proposal:
> > > > - BufferConsumer should not be removed from buffers until the
> consumed
> > > > ExecutionVertex is terminal.
> > > > - SpillableSubpartition should not be released until the consumed
> > > > ExecutionVertex is terminal.
> > > > - SpillableSubpartition could creates multi
> SpillableSubpartitionViews,
> > > > each of which is corresponding to a ExecutionAttempt.
> > > >
> > > > Best,
> > > > Bo
> > >
> > >
> >
> >
>

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

Reply via email to