I saw that answer before, but as the response mentions its quite expensive. I was able to do so with a UDAF, but was curious if I was just missing something.
A more general question, what are the requirements to decide that a new Spark SQL function should be added? Being able to make UDAFs is great, but they also don't have native code generated and don't have supports to "generics". Pedro On Mon, Jul 11, 2016 at 11:52 PM, Yash Sharma <yash...@gmail.com> wrote: > This answers exactly what you are looking for - > > http://stackoverflow.com/a/34204640/1562474 > > On Tue, Jul 12, 2016 at 6:40 AM, Pedro Rodriguez <ski.rodrig...@gmail.com> > wrote: > >> Is it possible with Spark SQL to merge columns whose types are Arrays or >> Sets? >> >> My use case would be something like this: >> >> DF types >> id: String >> words: Array[String] >> >> I would want to do something like >> >> df.groupBy('id).agg(merge_arrays('words)) -> list of all words >> df.groupBy('id).agg(merge_sets('words)) -> list of distinct words >> >> Thanks, >> -- >> Pedro Rodriguez >> PhD Student in Distributed Machine Learning | CU Boulder >> UC Berkeley AMPLab Alumni >> >> ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 >> Github: github.com/EntilZha | LinkedIn: >> https://www.linkedin.com/in/pedrorodriguezscience >> >> > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience