Hi Spark-users, I came across a few sources which mentioned DataFrame can be more efficient than Dataset. I can understand this is true because Dataset allows functional transformation which Catalyst cannot look into and hence cannot optimize well. But can DataFrame be more efficient than Dataset even if we only use the relational transformation on dataset? If so, can anyone give some explanation why it is so? Any benchmark comparing dataset vs. dataframe? Thank you!
Shiyuan