Union, like all operators, is lazy. When you call union, it only builds a
"union stream", that unions when you execute the task. So nothing is added
before you call "env.execute()"

After you call "env.execute()" and then union again, you will re-execute
the entire history of computation to compute the data set that you union
with. Hence, for incremental computations, union() is probably not a good
choice, unless you persist intermediate data (seamless support for that is
WIP).

Stephan


On Mon, Sep 7, 2015 at 2:56 PM, Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> Hi to all,
> I have a job where I have to incrementally add Tuples to a dataset (in a
> while loop).
> Is union() the best operator for this task or is there a more performant
> operator for this task?
> Does union affect the read of already existing elements or it just appends
> the new ones somewhere?
>
> Best,
> Flavio
>
>
>

Reply via email to