Re: How Spark sql query optimisation work if we are using .rdd action ?

ayan guha Sun, 14 Aug 2016 01:57:06 -0700

I do not think so. What I understand Spark will still use Catalyst to join.
DF always has an RDD underneath, but that does not mean any action will
force less optimal path.


On Sun, Aug 14, 2016 at 3:04 PM, mayur bhole <mayur.bhol...@gmail.com>
wrote:

> HI All,
>
> Lets say, we have
>
> val df = bigTableA.join(bigTableB,bigTableA("A")===bigTableB("A"),"left")
> val rddFromDF = df.rdd
> println(rddFromDF.count)
>
> My understanding is that spark will convert all data frame operations
> before "rddFromDF.count" into RDD equivalent operation as we are not
> performing any action on dataframe directly. In that case, spark will not
> be using optimization engine. Is my assumption right? Please point me to
> right resources.
>
> [ Note : I have posted same question on so : http://stackoverflow.com/
> questions/38889812/how-spark-dataframe-optimization-engine-works-with-dag
> ]
>
> Thanks
>



-- 
Best Regards,
Ayan Guha

Re: How Spark sql query optimisation work if we are using .rdd action ?

Reply via email to