DataSet<String> result = in.rebalance()
.map(new Mapper());In the case does the 'map'
receive all the data then begin to worker?Will rebalance operator failed cause
some duplicate record if the above answer is false ?
> Date: Thu, 9 Jul 2015 15:40:18 +0200
> Subject: Re: Does DataSet job also use Barriers to ensure "exactly once."?
> From: se...@apache.org
> To: dev@flink.apache.org
>
> Currently, Flink restarts the entire job upon failure.
>
> There is WIP that restricts this to all tasks involved in the pipeline of
> the failed task.
>
> Let's say we have pipelined MapReduce. If a mapper fails, the reducers that
> have received some data already have to be restarted as well.
>
> In that case, pipelined exchange works like "speculatively" starting the
> reducers early. It helps when no failure occurs.
> When a failure occurs, the reducers do still not start later than in a
> batch exchange mode, where they are started only once the mappers are done
> (and no failure can occur any more).
>
>
> On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <maguo...@outlook.com> wrote:
>
> > DataExchangeMode is Piped
> > If Two operators use Piped Mode to exchange the data , Failed partitions
> > have already send some data to the receiver before it failed.So Does
> > Replaying all the failed partitions cause some duplicate records ?
> >
> >
> > > Date: Thu, 9 Jul 2015 14:47:29 +0200
> > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly
> > once."?
> > > From: ktzou...@apache.org
> > > To: dev@flink.apache.org
> > >
> > > No, it doesn't; periodic snapshots are not needed in DataSet programs, as
> > > DataSets are of finite size and failed partitions can be replayed
> > > completely.
> > >
> > >
> > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <maguo...@outlook.com> wrote:
> > >
> > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure
> > > > "exactly once."Does the DataSet job use the same mechanism to ensue
> > > > "exactly once" if a map task is failed?thanks
> > > >
> >
> >