Re: Aggregator support in DataFrame

Koert Kuipers Tue, 12 Apr 2016 09:54:46 -0700

i dont really see how Aggregator can be useful for DataFrame unless you can
specify what columns it works on. Having to code Aggregators to always use
Row and then extract the values yourself breaks the abstraction and makes
it not much better than UserDefinedAggregateFunction (well... maybe still
better because i have encoders so i can use kryo).


On Mon, Apr 11, 2016 at 10:53 PM, Koert Kuipers <ko...@tresata.com> wrote:

> saw that, dont think it solves it. i basically want to add some children
> to the expression i guess, to indicate what i am operating on? not sure if
> even makes sense
>
> On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> I'll note this interface has changed recently:
>> https://github.com/apache/spark/commit/520dde48d0d52dbbbbe1710a3275fdd5355dd69d
>>
>> I'm not sure that solves your problem though...
>>
>> On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i like the Aggregator a lot
>>> (org.apache.spark.sql.expressions.Aggregator), but i find the way to use it
>>> somewhat confusing. I am supposed to simply call aggregator.toColumn, but
>>> that doesn't allow me to specify which fields it operates on in a DataFrame.
>>>
>>> i would basically like to do something like
>>> dataFrame
>>>   .groupBy("k")
>>>   .agg(
>>>     myAggregator.on("v1", "v2").toColumn,
>>>     myOtherAggregator.on("v3", "v4").toColumn
>>>   )
>>>
>>
>>
>

Re: Aggregator support in DataFrame

Reply via email to