Thanks, Michael and java8964! Does Hive Context also provides udf for combining existing lists, into flattened(not nested) list? (list->list of lists -[flatten]->list).
On Thu, Oct 15, 2015 at 1:16 AM Michael Armbrust <mich...@databricks.com> wrote: > Thats correct. It is a Hive UDAF. > > On Wed, Oct 14, 2015 at 6:45 AM, java8964 <java8...@hotmail.com> wrote: > >> My guess is the same as UDAF of (collect_set) in Hive. >> >> >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF) >> >> Yong >> >> ------------------------------ >> From: sliznmail...@gmail.com >> Date: Wed, 14 Oct 2015 02:45:48 +0000 >> Subject: Re: Spark DataFrame GroupBy into List >> To: mich...@databricks.com >> CC: user@spark.apache.org >> >> >> Hi Michael, >> >> Can you be more specific on `collect_set`? Is it a built-in function or, >> if it is an UDF, how it is defined? >> >> BR, >> Todd Leo >> >> On Wed, Oct 14, 2015 at 2:12 AM Michael Armbrust <mich...@databricks.com> >> wrote: >> >> import org.apache.spark.sql.functions._ >> >> df.groupBy("category") >> .agg(callUDF("collect_set", df("id")).as("id_list")) >> >> On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmail...@gmail.com> >> wrote: >> >> Hey Spark users, >> >> I'm trying to group by a dataframe, by appending occurrences into a list >> instead of count. >> >> Let's say we have a dataframe as shown below: >> >> | category | id | >> | -------- |:--:| >> | A | 1 | >> | A | 2 | >> | B | 3 | >> | B | 4 | >> | C | 5 | >> >> ideally, after some magic group by (reverse explode?): >> >> | category | id_list | >> | -------- | -------- | >> | A | 1,2 | >> | B | 3,4 | >> | C | 5 | >> >> any tricks to achieve that? Scala Spark API is preferred. =D >> >> BR, >> Todd Leo >> >> >> >> >> >