Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-14 Thread Xinh Huynh
Hi Arun, This documentation may be helpful: The 2.0-preview Scala doc for Dataset class: http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.sql.Dataset Note that the Dataset API has completely changed from 1.6. In 2.0, there is no separate DataFrame class. Rather, i

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-14 Thread Michael Armbrust
> > 1) What does this really mean to an Application developer? > It means there are less concepts to learn. > 2) Why this unification was needed in Spark 2.0? > To simplify the API and reduce the number of concepts that needed to be learned. We only didn't do it in 1.6 because we didn't want t

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-14 Thread Arun Patel
Can anyone answer these questions please. On Mon, Jun 13, 2016 at 6:51 PM, Arun Patel wrote: > Thanks Michael. > > I went thru these slides already and could not find answers for these > specific questions. > > I created a Dataset and converted it to DataFrame in 1.6 and 2.0. I don't > see an

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-13 Thread Arun Patel
Thanks Michael. I went thru these slides already and could not find answers for these specific questions. I created a Dataset and converted it to DataFrame in 1.6 and 2.0. I don't see any difference in 1.6 vs 2.0. So, I really got confused and asked these questions about unification. Appreciat

Re: Spark 2.0: Unify DataFrames and Datasets question

2016-06-13 Thread Michael Armbrust
Here's a talk I gave on the topic: https://www.youtube.com/watch?v=i7l3JQRx7Qw http://www.slideshare.net/SparkSummit/structuring-spark-dataframes-datasets-and-streaming-by-michael-armbrust On Mon, Jun 13, 2016 at 4:01 AM, Arun Patel wrote: > In Spark 2.0, DataFrames and Datasets are unified. Da