I would be open to working on Dataset documentation if no one else isn't already working on it. Thoughts?
On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > As mentioned in the PR description, this is just an initial PR to bring > existing contents up to date, so that people can add more contents > incrementally. > > We should definitely cover more about Dataset. > > > Cheng > > On 6/17/16 10:28 PM, Pedro Rodriguez wrote: > > The updates look great! > > Looks like many places are updated to the new APIs, but there still isn't > a section for working with Datasets (most of the docs work with > Dataframes). Are you planning on adding more? I am thinking something that > would address common questions like the one I posted on the user email list > earlier today. > > Should I take discussion to your PR? > > Pedro > > On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs....@gmail.com> > wrote: > >> Hey Pedro, >> >> SQL programming guide is being updated. Here's the PR, but not merged >> yet: https://github.com/apache/spark/pull/13592 >> >> Cheng >> On 6/17/16 9:13 PM, Pedro Rodriguez wrote: >> >> Hi All, >> >> At my workplace we are starting to use Datasets in 1.6.1 and even more >> with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation >> then the 2.0 documentation and it looks like not much time has been spent >> writing a Dataset guide/tutorial. >> >> Preview Docs: >> https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets >> Spark master docs: >> https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md >> >> I would like to spend the time to contribute an improvement to those docs >> with a more in depth examples of creating and using Datasets (eg using $ to >> select columns). Is this of value, and if so what should my next step be to >> get this going (create JIRA etc)? >> >> -- >> Pedro Rodriguez >> PhD Student in Distributed Machine Learning | CU Boulder >> R&D Data Science Intern at Oracle Data Cloud >> UC Berkeley AMPLab Alumni >> >> <ski.rodrig...@gmail.com>ski.rodrig...@gmail.com | pedrorodriguez.io | >> 909-353-4423 >> Github: github.com/EntilZha | LinkedIn: >> <https://www.linkedin.com/in/pedrorodriguezscience> >> https://www.linkedin.com/in/pedrorodriguezscience >> >> >> > > > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > > > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience