In pipelined execution, the mapper will start once it receives data. In batch-only execution, the mapper will start once it received all data. In either case, there won't be any duplicate records. If an error occurs, the entire job will be restarted.
As Stephan mentioned, we will soon have a per-task fault tolerance for non-pipelined tasks. Eventually, we also want to support pipelined jobs with persisted intermediate results. On Thu, Jul 9, 2015 at 4:13 PM, 马国维 <maguo...@outlook.com> wrote: > DataSet<String> result = in.rebalance() > .map(new Mapper());In the case does the 'map' > receive all the data then begin to worker?Will rebalance operator failed > cause some duplicate record if the above answer is false ? > > Date: Thu, 9 Jul 2015 15:40:18 +0200 > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > once."? > > From: se...@apache.org > > To: dev@flink.apache.org > > > > Currently, Flink restarts the entire job upon failure. > > > > There is WIP that restricts this to all tasks involved in the pipeline of > > the failed task. > > > > Let's say we have pipelined MapReduce. If a mapper fails, the reducers > that > > have received some data already have to be restarted as well. > > > > In that case, pipelined exchange works like "speculatively" starting the > > reducers early. It helps when no failure occurs. > > When a failure occurs, the reducers do still not start later than in a > > batch exchange mode, where they are started only once the mappers are > done > > (and no failure can occur any more). > > > > > > On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <maguo...@outlook.com> wrote: > > > > > DataExchangeMode is Piped > > > If Two operators use Piped Mode to exchange the data , Failed > partitions > > > have already send some data to the receiver before it failed.So Does > > > Replaying all the failed partitions cause some duplicate records ? > > > > > > > > > > Date: Thu, 9 Jul 2015 14:47:29 +0200 > > > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > > > once."? > > > > From: ktzou...@apache.org > > > > To: dev@flink.apache.org > > > > > > > > No, it doesn't; periodic snapshots are not needed in DataSet > programs, as > > > > DataSets are of finite size and failed partitions can be replayed > > > > completely. > > > > > > > > > > > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <maguo...@outlook.com> wrote: > > > > > > > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > > > > > "exactly once."Does the DataSet job use the same mechanism to ensue > > > > > "exactly once" if a map task is failed?thanks > > > > > > > > > > > > >