Please go for it! On Friday, June 17, 2016, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote:
> I would be open to working on Dataset documentation if no one else isn't > already working on it. Thoughts? > > On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian <lian.cs....@gmail.com > <javascript:_e(%7B%7D,'cvml','lian.cs....@gmail.com');>> wrote: > >> As mentioned in the PR description, this is just an initial PR to bring >> existing contents up to date, so that people can add more contents >> incrementally. >> >> We should definitely cover more about Dataset. >> >> >> Cheng >> >> On 6/17/16 10:28 PM, Pedro Rodriguez wrote: >> >> The updates look great! >> >> Looks like many places are updated to the new APIs, but there still isn't >> a section for working with Datasets (most of the docs work with >> Dataframes). Are you planning on adding more? I am thinking something that >> would address common questions like the one I posted on the user email list >> earlier today. >> >> Should I take discussion to your PR? >> >> Pedro >> >> On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs....@gmail.com >> <javascript:_e(%7B%7D,'cvml','lian.cs....@gmail.com');>> wrote: >> >>> Hey Pedro, >>> >>> SQL programming guide is being updated. Here's the PR, but not merged >>> yet: https://github.com/apache/spark/pull/13592 >>> >>> Cheng >>> On 6/17/16 9:13 PM, Pedro Rodriguez wrote: >>> >>> Hi All, >>> >>> At my workplace we are starting to use Datasets in 1.6.1 and even more >>> with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation >>> then the 2.0 documentation and it looks like not much time has been spent >>> writing a Dataset guide/tutorial. >>> >>> Preview Docs: >>> https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets >>> Spark master docs: >>> https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md >>> >>> I would like to spend the time to contribute an improvement to those >>> docs with a more in depth examples of creating and using Datasets (eg using >>> $ to select columns). Is this of value, and if so what should my next step >>> be to get this going (create JIRA etc)? >>> >>> -- >>> Pedro Rodriguez >>> PhD Student in Distributed Machine Learning | CU Boulder >>> R&D Data Science Intern at Oracle Data Cloud >>> UC Berkeley AMPLab Alumni >>> >>> <javascript:_e(%7B%7D,'cvml','ski.rodrig...@gmail.com');> >>> ski.rodrig...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','ski.rodrig...@gmail.com');> | >>> pedrorodriguez.io | 909-353-4423 >>> Github: github.com/EntilZha | LinkedIn: >>> <https://www.linkedin.com/in/pedrorodriguezscience> >>> https://www.linkedin.com/in/pedrorodriguezscience >>> >>> >>> >> >> >> -- >> Pedro Rodriguez >> PhD Student in Distributed Machine Learning | CU Boulder >> UC Berkeley AMPLab Alumni >> >> ski.rodrig...@gmail.com >> <javascript:_e(%7B%7D,'cvml','ski.rodrig...@gmail.com');> | >> pedrorodriguez.io | 909-353-4423 >> Github: github.com/EntilZha | LinkedIn: >> https://www.linkedin.com/in/pedrorodriguezscience >> >> >> > > > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com > <javascript:_e(%7B%7D,'cvml','ski.rodrig...@gmail.com');> | > pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > >