Re: Does DataSet job also use Barriers to ensure "exactly once."?

马国维 Thu, 09 Jul 2015 07:52:12 -0700

thank you！I see.
thank all of you.

发自我的 iPhone


> 在 2015年7月9日，下午10:48，Stephan Ewen <se...@apache.org> 写道：
> 
> Any operator in a batch job will receive all of its elements in one
> complete successful run.
> 
> The mapper starts its work immediately. On a failure, a fresh mapper is
> used, and all of the data is replayed. You can think of it as if there was
> only a single checkpoint at the very beginning (before any data was sent)
> that they fall back to. For mapper-internal state, there can be no
> duplicates.
> 
> For the interaction with the outside world, there can always be duplicates,
> for example if the mapper inserts data into a database. The database would
> have data from the initial run (that failed or was canceled) and the
> recovery run.
> 
> 
> 
> 
>> On Thu, Jul 9, 2015 at 4:13 PM, 马国维 <maguo...@outlook.com> wrote:
>> 
>> DataSet<String> result = in.rebalance()
>>                           .map(new Mapper());In the case  does the 'map'
>> receive all the data then begin to worker?Will rebalance operator failed
>> cause some duplicate record if the above answer is false ?
>>> Date: Thu, 9 Jul 2015 15:40:18 +0200
>>> Subject: Re: Does DataSet job also use Barriers to ensure "exactly
>> once."?
>>> From: se...@apache.org
>>> To: dev@flink.apache.org
>>> 
>>> Currently, Flink restarts the entire job upon failure.
>>> 
>>> There is WIP that restricts this to all tasks involved in the pipeline of
>>> the failed task.
>>> 
>>> Let's say we have pipelined MapReduce. If a mapper fails, the reducers
>> that
>>> have received some data already have to be restarted as well.
>>> 
>>> In that case, pipelined exchange works like "speculatively" starting the
>>> reducers early. It helps when no failure occurs.
>>> When a failure occurs, the reducers do still not start later than in a
>>> batch exchange mode, where they are started only once the mappers are
>> done
>>> (and no failure can occur any more).
>>> 
>>> 
>>>> On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <maguo...@outlook.com> wrote:
>>>> 
>>>> DataExchangeMode is Piped
>>>> If Two operators use Piped Mode to exchange the data , Failed
>> partitions
>>>> have  already send some data to the receiver before it failed.So Does
>>>> Replaying all the failed partitions  cause some duplicate records ?
>>>> 
>>>> 
>>>>> Date: Thu, 9 Jul 2015 14:47:29 +0200
>>>>> Subject: Re: Does DataSet job also use Barriers to ensure "exactly
>>>> once."?
>>>>> From: ktzou...@apache.org
>>>>> To: dev@flink.apache.org
>>>>> 
>>>>> No, it doesn't; periodic snapshots are not needed in DataSet
>> programs, as
>>>>> DataSets are of finite size and failed partitions can be replayed
>>>>> completely.
>>>>> 
>>>>> 
>>>>>> On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <maguo...@outlook.com> wrote:
>>>>>> 
>>>>>> hi, everyoneThe doc say Flink Streaming use "Barriers" to  ensure
>>>>>> "exactly once."Does the DataSet job use the same mechanism to ensue
>>>>>> "exactly once"  if a map task is failed?thanks
>> 
>>

Re: Does DataSet job also use Barriers to ensure "exactly once."?

Reply via email to