Hi Anton,
Your example and documentation looks great! I left some comments
suggesting a few additions, but the PR in its current state is a great
improvement!
Thanks,
Jim
On 12/18/2016 09:09 AM, Anton Okolnychyi wrote:
Any comments/suggestions are more than welcome.
Thanks,
Anton
2016-12-18 15:08 GMT+01:00 Anton Okolnychyi
<anton.okolnyc...@gmail.com <mailto:anton.okolnyc...@gmail.com>>:
Here is the pull request:
https://github.com/apache/spark/pull/16329
<https://github.com/apache/spark/pull/16329>
2016-12-16 20:54 GMT+01:00 Jim Hughes <jn...@ccri.com
<mailto:jn...@ccri.com>>:
I'd be happy to review a PR. At the minute, I'm still
learning Spark SQL, so writing documentation might be a bit of
a stretch, but reviewing would be fine.
Thanks!
On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote:
Yes - that sounds good Anton, I can work on documenting the
window functions.
*From: *Anton Okolnychyi <anton.okolnyc...@gmail.com>
<mailto:anton.okolnyc...@gmail.com>
*Date: *Thursday, December 15, 2016 at 4:34 PM
*To: *Conversant <jthak...@conversantmedia.com>
<mailto:jthak...@conversantmedia.com>
*Cc: *Michael Armbrust <mich...@databricks.com>
<mailto:mich...@databricks.com>, Jim Hughes <jn...@ccri.com>
<mailto:jn...@ccri.com>, "dev@spark.apache.org"
<mailto:dev@spark.apache.org> <dev@spark.apache.org>
<mailto:dev@spark.apache.org>
*Subject: *Re: Expand the Spark SQL programming guide?
I think it will make sense to show a sample implementation of
UserDefinedAggregateFunction for DataFrames, and an example
of the Aggregator API for typed Datasets.
Jim, what if I submit a PR and you join the review process? I
also do not mind to split this if you want, but it seems to
be an overkill for this part.
Jayesh, shall I skip the window functions part since you are
going to work on that?
2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh
<jthak...@conversantmedia.com
<mailto:jthak...@conversantmedia.com>>:
I too am interested in expanding the documentation for
Spark SQL.
For my work I needed to get some info/examples/guidance
on window functions and have been using
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
<https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html>
.
How about divide and conquer?
*From: *Michael Armbrust <mich...@databricks.com
<mailto:mich...@databricks.com>>
*Date: *Thursday, December 15, 2016 at 3:21 PM
*To: *Jim Hughes <jn...@ccri.com <mailto:jn...@ccri.com>>
*Cc: *"dev@spark.apache.org
<mailto:dev@spark.apache.org>" <dev@spark.apache.org
<mailto:dev@spark.apache.org>>
*Subject: *Re: Expand the Spark SQL programming guide?
Pull requests would be welcome for any major missing
features in the guide:
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md
<https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md>
On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes
<jn...@ccri.com <mailto:jn...@ccri.com>> wrote:
Hi Anton,
I'd like to see this as well. I've been working on
implementing geospatial user-defined types and
functions. Having examples of aggregations and window
functions would be awesome!
I did test out implementing a distributed convex hull
as a UserDefinedAggregateFunction, and that seemed to
work sensibly.
Cheers,
Jim
On 12/15/2016 03:28 AM, Anton Okolnychyi wrote:
Hi,
I am wondering whether it makes sense to expand
the Spark SQL programming guide with examples of
aggregations (including user-defined via the
Aggregator API) and window functions. For
instance, there might be a separate
subsection under "Getting Started" for each
functionality.
SPARK-16046 seems to be related but there is no
activity for more than 4 months.
Best regards,
Anton