First, the public API cannot be changed except when there is a major version change, and there is no way that we are going to do Spark 3.0.0 just for this change.
Second, the change would be a mistake since the two different union methods are quite different. The method in RDD only ever works on two RDDs at a time, whereas the method in SparkContext can work on many RDDs in a single call. That means that the method in SparkContext is much preferred when unioning many RDDs to prevent a lengthy lineage chain. On Mon, Feb 5, 2018 at 8:04 AM, Suchith J N <suchithj...@gmail.com> wrote: > Hi, > > Seems like simple clean up - Why do we have union() on RDDs in > SparkContext? Shouldn't it reside in RDD? There is one in RDD, but it seems > like a wrapper around this. > > Regards, > Suchith >