I do not think so. What I understand Spark will still use Catalyst to join. DF always has an RDD underneath, but that does not mean any action will force less optimal path.
On Sun, Aug 14, 2016 at 3:04 PM, mayur bhole <mayur.bhol...@gmail.com> wrote: > HI All, > > Lets say, we have > > val df = bigTableA.join(bigTableB,bigTableA("A")===bigTableB("A"),"left") > val rddFromDF = df.rdd > println(rddFromDF.count) > > My understanding is that spark will convert all data frame operations > before "rddFromDF.count" into RDD equivalent operation as we are not > performing any action on dataframe directly. In that case, spark will not > be using optimization engine. Is my assumption right? Please point me to > right resources. > > [ Note : I have posted same question on so : http://stackoverflow.com/ > questions/38889812/how-spark-dataframe-optimization-engine-works-with-dag > ] > > Thanks > -- Best Regards, Ayan Guha