Re: Aggregator support in DataFrame

2016-04-12 Thread Koert Kuipers
still not sure how to use this with a DataFrame, assuming i cannot convert it to a specific Dataset with .as (because i got lots of columns, or because at compile time these types are simply not known). i cannot specify the columns these operate on. i can resort to Row transformations, like this:

Re: Aggregator support in DataFrame

2016-04-12 Thread Michael Armbrust
Did you see these? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scala/typed.scala#L70 On Tue, Apr 12, 2016 at 9:46 AM, Koert Kuipers wrote: > i dont really see how Aggregator can be useful for DataFrame unless you > can specify what column

Re: Aggregator support in DataFrame

2016-04-12 Thread Koert Kuipers
i dont really see how Aggregator can be useful for DataFrame unless you can specify what columns it works on. Having to code Aggregators to always use Row and then extract the values yourself breaks the abstraction and makes it not much better than UserDefinedAggregateFunction (well... maybe still

Re: Aggregator support in DataFrame

2016-04-11 Thread Koert Kuipers
saw that, dont think it solves it. i basically want to add some children to the expression i guess, to indicate what i am operating on? not sure if even makes sense On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust wrote: > I'll note this interface has changed recently: > https://github.com/apac

Re: Aggregator support in DataFrame

2016-04-11 Thread Michael Armbrust
I'll note this interface has changed recently: https://github.com/apache/spark/commit/520dde48d0d52de1710a3275fdd5355dd69d I'm not sure that solves your problem though... On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers wrote: > i like the Aggregator a lot (org.apache.spark.sql.expressions.Ag