Re: Aggregator support in DataFrame

Michael Armbrust Tue, 12 Apr 2016 11:02:34 -0700

Did you see these?

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scala/typed.scala#L70


On Tue, Apr 12, 2016 at 9:46 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i dont really see how Aggregator can be useful for DataFrame unless you
> can specify what columns it works on. Having to code Aggregators to always
> use Row and then extract the values yourself breaks the abstraction and
> makes it not much better than UserDefinedAggregateFunction (well... maybe
> still better because i have encoders so i can use kryo).
>
> On Mon, Apr 11, 2016 at 10:53 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> saw that, dont think it solves it. i basically want to add some children
>> to the expression i guess, to indicate what i am operating on? not sure if
>> even makes sense
>>
>> On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> I'll note this interface has changed recently:
>>> https://github.com/apache/spark/commit/520dde48d0d52dbbbbe1710a3275fdd5355dd69d
>>>
>>> I'm not sure that solves your problem though...
>>>
>>> On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> i like the Aggregator a lot
>>>> (org.apache.spark.sql.expressions.Aggregator), but i find the way to use it
>>>> somewhat confusing. I am supposed to simply call aggregator.toColumn, but
>>>> that doesn't allow me to specify which fields it operates on in a 
>>>> DataFrame.
>>>>
>>>> i would basically like to do something like
>>>> dataFrame
>>>>   .groupBy("k")
>>>>   .agg(
>>>>     myAggregator.on("v1", "v2").toColumn,
>>>>     myOtherAggregator.on("v3", "v4").toColumn
>>>>   )
>>>>
>>>
>>>
>>
>

Re: Aggregator support in DataFrame

Reply via email to