i dont really see how Aggregator can be useful for DataFrame unless you can specify what columns it works on. Having to code Aggregators to always use Row and then extract the values yourself breaks the abstraction and makes it not much better than UserDefinedAggregateFunction (well... maybe still better because i have encoders so i can use kryo).
On Mon, Apr 11, 2016 at 10:53 PM, Koert Kuipers <ko...@tresata.com> wrote: > saw that, dont think it solves it. i basically want to add some children > to the expression i guess, to indicate what i am operating on? not sure if > even makes sense > > On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> I'll note this interface has changed recently: >> https://github.com/apache/spark/commit/520dde48d0d52dbbbbe1710a3275fdd5355dd69d >> >> I'm not sure that solves your problem though... >> >> On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> i like the Aggregator a lot >>> (org.apache.spark.sql.expressions.Aggregator), but i find the way to use it >>> somewhat confusing. I am supposed to simply call aggregator.toColumn, but >>> that doesn't allow me to specify which fields it operates on in a DataFrame. >>> >>> i would basically like to do something like >>> dataFrame >>> .groupBy("k") >>> .agg( >>> myAggregator.on("v1", "v2").toColumn, >>> myOtherAggregator.on("v3", "v4").toColumn >>> ) >>> >> >> >