RE: Does DataSet job also use Barriers to ensure "exactly once."?

马国维 Thu, 09 Jul 2015 07:14:03 -0700
DataSet<String> result = in.rebalance()
                           .map(new Mapper());In the case  does the 'map'  
receive all the data then begin to worker?Will rebalance operator failed cause 
some duplicate record if the above answer is false ?
> Date: Thu, 9 Jul 2015 15:40:18 +0200
> Subject: Re: Does DataSet job also use Barriers to ensure "exactly once."?
> From: se...@apache.org
> To: dev@flink.apache.org
> 
> Currently, Flink restarts the entire job upon failure.
> 
> There is WIP that restricts this to all tasks involved in the pipeline of
> the failed task.
> 
> Let's say we have pipelined MapReduce. If a mapper fails, the reducers that
> have received some data already have to be restarted as well.
> 
> In that case, pipelined exchange works like "speculatively" starting the
> reducers early. It helps when no failure occurs.
> When a failure occurs, the reducers do still not start later than in a
> batch exchange mode, where they are started only once the mappers are done
> (and no failure can occur any more).
> 
> 
> On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <maguo...@outlook.com> wrote:
> 
> > DataExchangeMode is Piped
> > If Two operators use Piped Mode to exchange the data , Failed partitions
> > have  already send some data to the receiver before it failed.So Does
> > Replaying all the failed partitions  cause some duplicate records ?
> >
> >
> > > Date: Thu, 9 Jul 2015 14:47:29 +0200
> > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly
> > once."?
> > > From: ktzou...@apache.org
> > > To: dev@flink.apache.org
> > >
> > > No, it doesn't; periodic snapshots are not needed in DataSet programs, as
> > > DataSets are of finite size and failed partitions can be replayed
> > > completely.
> > >
> > >
> > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <maguo...@outlook.com> wrote:
> > >
> > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to  ensure
> > > > "exactly once."Does the DataSet job use the same mechanism to ensue
> > > > "exactly once"  if a map task is failed?thanks
> > > >
> >
> >
RE: Does DataSet job also use Barriers to ensure "exactly once."?

Reply via email to