Hi Arun,
This documentation may be helpful:
The 2.0-preview Scala doc for Dataset class:
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.sql.Dataset
Note that the Dataset API has completely changed from 1.6.
In 2.0, there is no separate DataFrame class. Rather, i
>
> 1) What does this really mean to an Application developer?
>
It means there are less concepts to learn.
> 2) Why this unification was needed in Spark 2.0?
>
To simplify the API and reduce the number of concepts that needed to be
learned. We only didn't do it in 1.6 because we didn't want t
Can anyone answer these questions please.
On Mon, Jun 13, 2016 at 6:51 PM, Arun Patel wrote:
> Thanks Michael.
>
> I went thru these slides already and could not find answers for these
> specific questions.
>
> I created a Dataset and converted it to DataFrame in 1.6 and 2.0. I don't
> see an
Thanks Michael.
I went thru these slides already and could not find answers for these
specific questions.
I created a Dataset and converted it to DataFrame in 1.6 and 2.0. I don't
see any difference in 1.6 vs 2.0. So, I really got confused and asked
these questions about unification.
Appreciat
Here's a talk I gave on the topic:
https://www.youtube.com/watch?v=i7l3JQRx7Qw
http://www.slideshare.net/SparkSummit/structuring-spark-dataframes-datasets-and-streaming-by-michael-armbrust
On Mon, Jun 13, 2016 at 4:01 AM, Arun Patel wrote:
> In Spark 2.0, DataFrames and Datasets are unified. Da
In Spark 2.0, DataFrames and Datasets are unified. DataFrame is simply an
alias for a Dataset of type row. I have few questions.
1) What does this really mean to an Application developer?
2) Why this unification was needed in Spark 2.0?
3) What changes can be observed in Spark 2.0 vs Spark 1.6?