Re: Union in Spark context

Mark Hamstra Mon, 05 Feb 2018 08:48:21 -0800

First, the public API cannot be changed except when there is a major
version change, and there is no way that we are going to do Spark 3.0.0
just for this change.

Second, the change would be a mistake since the two different union methods
are quite different. The method in RDD only ever works on two RDDs at a
time, whereas the method in SparkContext can work on many RDDs in a single
call. That means that the method in SparkContext is much preferred when
unioning many RDDs to prevent a lengthy lineage chain.

On Mon, Feb 5, 2018 at 8:04 AM, Suchith J N <suchithj...@gmail.com> wrote:

> Hi,
>
> Seems like simple clean up - Why do we have union() on RDDs in
> SparkContext? Shouldn't it reside in RDD? There is one in RDD, but it seems
> like a wrapper around this.
>
> Regards,
> Suchith
>

Re: Union in Spark context

Reply via email to