I would be open to working on Dataset documentation if no one else isn't
already working on it. Thoughts?

On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian <lian.cs....@gmail.com> wrote:

> As mentioned in the PR description, this is just an initial PR to bring
> existing contents up to date, so that people can add more contents
> incrementally.
>
> We should definitely cover more about Dataset.
>
>
> Cheng
>
> On 6/17/16 10:28 PM, Pedro Rodriguez wrote:
>
> The updates look great!
>
> Looks like many places are updated to the new APIs, but there still isn't
> a section for working with Datasets (most of the docs work with
> Dataframes). Are you planning on adding more? I am thinking something that
> would address common questions like the one I posted on the user email list
> earlier today.
>
> Should I take discussion to your PR?
>
> Pedro
>
> On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs....@gmail.com>
> wrote:
>
>> Hey Pedro,
>>
>> SQL programming guide is being updated. Here's the PR, but not merged
>> yet: https://github.com/apache/spark/pull/13592
>>
>> Cheng
>> On 6/17/16 9:13 PM, Pedro Rodriguez wrote:
>>
>> Hi All,
>>
>> At my workplace we are starting to use Datasets in 1.6.1 and even more
>> with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation
>> then the 2.0 documentation and it looks like not much time has been spent
>> writing a Dataset guide/tutorial.
>>
>> Preview Docs:
>> https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets
>> Spark master docs:
>> https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md
>>
>> I would like to spend the time to contribute an improvement to those docs
>> with a more in depth examples of creating and using Datasets (eg using $ to
>> select columns). Is this of value, and if so what should my next step be to
>> get this going (create JIRA etc)?
>>
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> R&D Data Science Intern at Oracle Data Cloud
>> UC Berkeley AMPLab Alumni
>>
>> <ski.rodrig...@gmail.com>ski.rodrig...@gmail.com | pedrorodriguez.io |
>> 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> <https://www.linkedin.com/in/pedrorodriguezscience>
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
>>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Reply via email to