ok thanks. are there any alternatives to that?may I use accumulators for
that?
On 7 Sep 2015 17:47, "Fabian Hueske" <fhue...@gmail.com> wrote:

> If the loop count of 3 is fixed (or not significantly larger), union
> should be fine.
>
> 2015-09-07 17:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>
>> Sorry the program has a union at   accumulated = 
>> accumulated.union(x.filter(t.f1
>> == 0))
>>
>> On Mon, Sep 7, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>>
>>> Hi Flavio,
>>>
>>> your example does not contain a union.
>>>
>>> Union itself basically comes for free. However, if you have a lot of
>>> small DataSet that you want to union, the plan can become very complex and
>>> might cause overhead due to scheduling many small tasks. For example, it is
>>> usually better to have one data source and input format that reads multiple
>>> small files instead of adding one data source for each tiny file and apply
>>> union to all data sources to get all data.
>>>
>>> TL;DR; if your iteration count is only 3 as your example suggests you
>>> should be fine. If it exceeds say 32 it might be worth thinking about your
>>> program.
>>>
>>> Cheers, Fabian
>>>
>>>
>>>
>>> 2015-09-07 16:29 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>
>>>> Hi Stephan,
>>>> thanks for the answer. Unfortunately I dind't understand if there's an
>>>> alternative to union right now..
>>>> My process is basically like this:
>>>>
>>>> Dataset x = ...
>>>> while(loopCnt < 3){
>>>>    x = x.join(y).where(0).equalTo(0).with());
>>>>    accumulated = x.filter(t.f1 == 0);
>>>>    x =  x.filter(t.f1!=0);
>>>>    loopCnt++;
>>>> }
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>>
>>>> On Mon, Sep 7, 2015 at 3:15 PM, Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>>> Union, like all operators, is lazy. When you call union, it only
>>>>> builds a "union stream", that unions when you execute the task. So nothing
>>>>> is added before you call "env.execute()"
>>>>>
>>>>> After you call "env.execute()" and then union again, you will
>>>>> re-execute the entire history of computation to compute the data set that
>>>>> you union with. Hence, for incremental computations, union() is probably
>>>>> not a good choice, unless you persist intermediate data (seamless support
>>>>> for that is WIP).
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Mon, Sep 7, 2015 at 2:56 PM, Flavio Pompermaier <
>>>>> pomperma...@okkam.it> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>> I have a job where I have to incrementally add Tuples to a dataset
>>>>>> (in a while loop).
>>>>>> Is union() the best operator for this task or is there a more
>>>>>> performant operator for this task?
>>>>>> Does union affect the read of already existing elements or it just
>>>>>> appends the new ones somewhere?
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>

Reply via email to