Thats correct. It is a Hive UDAF. On Wed, Oct 14, 2015 at 6:45 AM, java8964 <java8...@hotmail.com> wrote:
> My guess is the same as UDAF of (collect_set) in Hive. > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF) > > Yong > > ------------------------------ > From: sliznmail...@gmail.com > Date: Wed, 14 Oct 2015 02:45:48 +0000 > Subject: Re: Spark DataFrame GroupBy into List > To: mich...@databricks.com > CC: user@spark.apache.org > > > Hi Michael, > > Can you be more specific on `collect_set`? Is it a built-in function or, > if it is an UDF, how it is defined? > > BR, > Todd Leo > > On Wed, Oct 14, 2015 at 2:12 AM Michael Armbrust <mich...@databricks.com> > wrote: > > import org.apache.spark.sql.functions._ > > df.groupBy("category") > .agg(callUDF("collect_set", df("id")).as("id_list")) > > On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmail...@gmail.com> > wrote: > > Hey Spark users, > > I'm trying to group by a dataframe, by appending occurrences into a list > instead of count. > > Let's say we have a dataframe as shown below: > > | category | id | > | -------- |:--:| > | A | 1 | > | A | 2 | > | B | 3 | > | B | 4 | > | C | 5 | > > ideally, after some magic group by (reverse explode?): > > | category | id_list | > | -------- | -------- | > | A | 1,2 | > | B | 3,4 | > | C | 5 | > > any tricks to achieve that? Scala Spark API is preferred. =D > > BR, > Todd Leo > > > > >