Did you see these? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scala/typed.scala#L70
On Tue, Apr 12, 2016 at 9:46 AM, Koert Kuipers <ko...@tresata.com> wrote: > i dont really see how Aggregator can be useful for DataFrame unless you > can specify what columns it works on. Having to code Aggregators to always > use Row and then extract the values yourself breaks the abstraction and > makes it not much better than UserDefinedAggregateFunction (well... maybe > still better because i have encoders so i can use kryo). > > On Mon, Apr 11, 2016 at 10:53 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> saw that, dont think it solves it. i basically want to add some children >> to the expression i guess, to indicate what i am operating on? not sure if >> even makes sense >> >> On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> I'll note this interface has changed recently: >>> https://github.com/apache/spark/commit/520dde48d0d52dbbbbe1710a3275fdd5355dd69d >>> >>> I'm not sure that solves your problem though... >>> >>> On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> i like the Aggregator a lot >>>> (org.apache.spark.sql.expressions.Aggregator), but i find the way to use it >>>> somewhat confusing. I am supposed to simply call aggregator.toColumn, but >>>> that doesn't allow me to specify which fields it operates on in a >>>> DataFrame. >>>> >>>> i would basically like to do something like >>>> dataFrame >>>> .groupBy("k") >>>> .agg( >>>> myAggregator.on("v1", "v2").toColumn, >>>> myOtherAggregator.on("v3", "v4").toColumn >>>> ) >>>> >>> >>> >> >