This answers exactly what you are looking for - http://stackoverflow.com/a/34204640/1562474
On Tue, Jul 12, 2016 at 6:40 AM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > Is it possible with Spark SQL to merge columns whose types are Arrays or > Sets? > > My use case would be something like this: > > DF types > id: String > words: Array[String] > > I would want to do something like > > df.groupBy('id).agg(merge_arrays('words)) -> list of all words > df.groupBy('id).agg(merge_sets('words)) -> list of distinct words > > Thanks, > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > >