Hi Flavio, your example does not contain a union.
Union itself basically comes for free. However, if you have a lot of small DataSet that you want to union, the plan can become very complex and might cause overhead due to scheduling many small tasks. For example, it is usually better to have one data source and input format that reads multiple small files instead of adding one data source for each tiny file and apply union to all data sources to get all data. TL;DR; if your iteration count is only 3 as your example suggests you should be fine. If it exceeds say 32 it might be worth thinking about your program. Cheers, Fabian 2015-09-07 16:29 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > Hi Stephan, > thanks for the answer. Unfortunately I dind't understand if there's an > alternative to union right now.. > My process is basically like this: > > Dataset x = ... > while(loopCnt < 3){ > x = x.join(y).where(0).equalTo(0).with()); > accumulated = x.filter(t.f1 == 0); > x = x.filter(t.f1!=0); > loopCnt++; > } > > Best, > Flavio > > > On Mon, Sep 7, 2015 at 3:15 PM, Stephan Ewen <se...@apache.org> wrote: > >> Union, like all operators, is lazy. When you call union, it only builds a >> "union stream", that unions when you execute the task. So nothing is added >> before you call "env.execute()" >> >> After you call "env.execute()" and then union again, you will re-execute >> the entire history of computation to compute the data set that you union >> with. Hence, for incremental computations, union() is probably not a good >> choice, unless you persist intermediate data (seamless support for that is >> WIP). >> >> Stephan >> >> >> On Mon, Sep 7, 2015 at 2:56 PM, Flavio Pompermaier <pomperma...@okkam.it> >> wrote: >> >>> Hi to all, >>> I have a job where I have to incrementally add Tuples to a dataset (in a >>> while loop). >>> Is union() the best operator for this task or is there a more performant >>> operator for this task? >>> Does union affect the read of already existing elements or it just >>> appends the new ones somewhere? >>> >>> Best, >>> Flavio >>> >>> >>> >> >