still not sure how to use this with a DataFrame, assuming i cannot convert
it to a specific Dataset with .as (because i got lots of columns, or
because at compile time these types are simply not known).
i cannot specify the columns these operate on. i can resort to Row
transformations, like this:
Did you see these?
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scala/typed.scala#L70
On Tue, Apr 12, 2016 at 9:46 AM, Koert Kuipers wrote:
> i dont really see how Aggregator can be useful for DataFrame unless you
> can specify what column
i dont really see how Aggregator can be useful for DataFrame unless you can
specify what columns it works on. Having to code Aggregators to always use
Row and then extract the values yourself breaks the abstraction and makes
it not much better than UserDefinedAggregateFunction (well... maybe still
saw that, dont think it solves it. i basically want to add some children to
the expression i guess, to indicate what i am operating on? not sure if
even makes sense
On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust
wrote:
> I'll note this interface has changed recently:
> https://github.com/apac
I'll note this interface has changed recently:
https://github.com/apache/spark/commit/520dde48d0d52de1710a3275fdd5355dd69d
I'm not sure that solves your problem though...
On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers wrote:
> i like the Aggregator a lot (org.apache.spark.sql.expressions.Ag
i like the Aggregator a lot (org.apache.spark.sql.expressions.Aggregator),
but i find the way to use it somewhat confusing. I am supposed to simply
call aggregator.toColumn, but that doesn't allow me to specify which fields
it operates on in a DataFrame.
i would basically like to do something like