Re: Union in Spark context

Suchith J N Mon, 05 Feb 2018 09:03:41 -0800

Thank you very much. I had overlooked the differences between the two.

The public API part is understandable.


Coming to second part. - I see that it creates an instance of UnionRDD with
all RDDs as parent there by preventing long lineage chain.
Is my understanding correct?

On 5 February 2018 at 22:17, Mark Hamstra <[email protected]> wrote:

> First, the public API cannot be changed except when there is a major
> version change, and there is no way that we are going to do Spark 3.0.0
> just for this change.
>
> Second, the change would be a mistake since the two different union
> methods are quite different. The method in RDD only ever works on two RDDs
> at a time, whereas the method in SparkContext can work on many RDDs in a
> single call. That means that the method in SparkContext is much preferred
> when unioning many RDDs to prevent a lengthy lineage chain.
>
> On Mon, Feb 5, 2018 at 8:04 AM, Suchith J N <[email protected]> wrote:
>
>> Hi,
>>
>> Seems like simple clean up - Why do we have union() on RDDs in
>> SparkContext? Shouldn't it reside in RDD? There is one in RDD, but it seems
>> like a wrapper around this.
>>
>> Regards,
>> Suchith
>>
>
>

Re: Union in Spark context

Reply via email to