In that case you should go with union. 2015-09-07 19:06 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
> 3 or 4 usually.. > On 7 Sep 2015 18:39, "Fabian Hueske" <fhue...@gmail.com> wrote: > >> And how many unions would your program use if you would follow the >> union-in-loop approach? >> >> 2015-09-07 18:31 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >>> In the order of 10 GB.. >>> >>> On Mon, Sep 7, 2015 at 6:14 PM, Fabian Hueske <fhue...@gmail.com> wrote: >>> >>>> Accumulators can be used to collect records, but they are not designed >>>> to hold large amounts of data. >>>> It might work up to a certain point (~10MB) and fail beyond that. >>>> >>>> How many unions do you plan to use in your program? >>>> >>>> >>>> >>>> 2015-09-07 17:58 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>>> >>>>> ok thanks. are there any alternatives to that?may I use accumulators >>>>> for that? >>>>> On 7 Sep 2015 17:47, "Fabian Hueske" <fhue...@gmail.com> wrote: >>>>> >>>>>> If the loop count of 3 is fixed (or not significantly larger), union >>>>>> should be fine. >>>>>> >>>>>> 2015-09-07 17:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>>>>> >>>>>>> Sorry the program has a union at accumulated = >>>>>>> accumulated.union(x.filter(t.f1 >>>>>>> == 0)) >>>>>>> >>>>>>> On Mon, Sep 7, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Flavio, >>>>>>>> >>>>>>>> your example does not contain a union. >>>>>>>> >>>>>>>> Union itself basically comes for free. However, if you have a lot >>>>>>>> of small DataSet that you want to union, the plan can become very >>>>>>>> complex >>>>>>>> and might cause overhead due to scheduling many small tasks. For >>>>>>>> example, >>>>>>>> it is usually better to have one data source and input format that >>>>>>>> reads >>>>>>>> multiple small files instead of adding one data source for each tiny >>>>>>>> file >>>>>>>> and apply union to all data sources to get all data. >>>>>>>> >>>>>>>> TL;DR; if your iteration count is only 3 as your example suggests >>>>>>>> you should be fine. If it exceeds say 32 it might be worth thinking >>>>>>>> about >>>>>>>> your program. >>>>>>>> >>>>>>>> Cheers, Fabian >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2015-09-07 16:29 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it >>>>>>>> >: >>>>>>>> >>>>>>>>> Hi Stephan, >>>>>>>>> thanks for the answer. Unfortunately I dind't understand if >>>>>>>>> there's an alternative to union right now.. >>>>>>>>> My process is basically like this: >>>>>>>>> >>>>>>>>> Dataset x = ... >>>>>>>>> while(loopCnt < 3){ >>>>>>>>> x = x.join(y).where(0).equalTo(0).with()); >>>>>>>>> accumulated = x.filter(t.f1 == 0); >>>>>>>>> x = x.filter(t.f1!=0); >>>>>>>>> loopCnt++; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Flavio >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 7, 2015 at 3:15 PM, Stephan Ewen <se...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Union, like all operators, is lazy. When you call union, it only >>>>>>>>>> builds a "union stream", that unions when you execute the task. So >>>>>>>>>> nothing >>>>>>>>>> is added before you call "env.execute()" >>>>>>>>>> >>>>>>>>>> After you call "env.execute()" and then union again, you will >>>>>>>>>> re-execute the entire history of computation to compute the data set >>>>>>>>>> that >>>>>>>>>> you union with. Hence, for incremental computations, union() is >>>>>>>>>> probably >>>>>>>>>> not a good choice, unless you persist intermediate data (seamless >>>>>>>>>> support >>>>>>>>>> for that is WIP). >>>>>>>>>> >>>>>>>>>> Stephan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Sep 7, 2015 at 2:56 PM, Flavio Pompermaier < >>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>> >>>>>>>>>>> Hi to all, >>>>>>>>>>> I have a job where I have to incrementally add Tuples to a >>>>>>>>>>> dataset (in a while loop). >>>>>>>>>>> Is union() the best operator for this task or is there a more >>>>>>>>>>> performant operator for this task? >>>>>>>>>>> Does union affect the read of already existing elements or it >>>>>>>>>>> just appends the new ones somewhere? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Flavio >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >>> >>