Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-14 Thread Xinh Huynh
Hi Arun, This documentation may be helpful: The 2.0-preview Scala doc for Dataset class: http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.sql.Dataset Note that the Dataset API has completely changed from 1.6. In 2.0, there is no separate DataFrame class. Rather, i

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-14 Thread Michael Armbrust
> > 1) What does this really mean to an Application developer? > It means there are less concepts to learn. > 2) Why this unification was needed in Spark 2.0? > To simplify the API and reduce the number of concepts that needed to be learned. We only didn't do it in 1.6 because we didn't want t

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-14 Thread Arun Patel
Can anyone answer these questions please. On Mon, Jun 13, 2016 at 6:51 PM, Arun Patel wrote: > Thanks Michael. > > I went thru these slides already and could not find answers for these > specific questions. > > I created a Dataset and converted it to DataFrame in 1.6 and 2.0. I don't > see an

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-13 Thread Arun Patel
Thanks Michael. I went thru these slides already and could not find answers for these specific questions. I created a Dataset and converted it to DataFrame in 1.6 and 2.0. I don't see any difference in 1.6 vs 2.0. So, I really got confused and asked these questions about unification. Appreciat

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-13 Thread Michael Armbrust
Here's a talk I gave on the topic: https://www.youtube.com/watch?v=i7l3JQRx7Qw http://www.slideshare.net/SparkSummit/structuring-spark-dataframes-datasets-and-streaming-by-michael-armbrust On Mon, Jun 13, 2016 at 4:01 AM, Arun Patel wrote: > In Spark 2.0, DataFrames and Datasets are unified. Da

Spark 2.0: Unify DataFrames and Datasets question

2016-06-13 Thread Arun Patel
In Spark 2.0, DataFrames and Datasets are unified. DataFrame is simply an alias for a Dataset of type row. I have few questions. 1) What does this really mean to an Application developer? 2) Why this unification was needed in Spark 2.0? 3) What changes can be observed in Spark 2.0 vs Spark 1.6?