Thank you very much. I had overlooked the differences between the two. The public API part is understandable.
Coming to second part. - I see that it creates an instance of UnionRDD with all RDDs as parent there by preventing long lineage chain. Is my understanding correct? On 5 February 2018 at 22:17, Mark Hamstra <m...@clearstorydata.com> wrote: > First, the public API cannot be changed except when there is a major > version change, and there is no way that we are going to do Spark 3.0.0 > just for this change. > > Second, the change would be a mistake since the two different union > methods are quite different. The method in RDD only ever works on two RDDs > at a time, whereas the method in SparkContext can work on many RDDs in a > single call. That means that the method in SparkContext is much preferred > when unioning many RDDs to prevent a lengthy lineage chain. > > On Mon, Feb 5, 2018 at 8:04 AM, Suchith J N <suchithj...@gmail.com> wrote: > >> Hi, >> >> Seems like simple clean up - Why do we have union() on RDDs in >> SparkContext? Shouldn't it reside in RDD? There is one in RDD, but it seems >> like a wrapper around this. >> >> Regards, >> Suchith >> > >