I saw that answer before, but as the response mentions its quite expensive.
I was able to do so with a UDAF, but was curious if I was just missing
something.

A more general question, what are the requirements to decide that a new
Spark SQL function should be added? Being able to make UDAFs is great, but
they also don't have native code generated and don't have supports to
"generics".

Pedro

On Mon, Jul 11, 2016 at 11:52 PM, Yash Sharma <yash...@gmail.com> wrote:

> This answers exactly what you are looking for -
>
> http://stackoverflow.com/a/34204640/1562474
>
> On Tue, Jul 12, 2016 at 6:40 AM, Pedro Rodriguez <ski.rodrig...@gmail.com>
> wrote:
>
>> Is it possible with Spark SQL to merge columns whose types are Arrays or
>> Sets?
>>
>> My use case would be something like this:
>>
>> DF types
>> id: String
>> words: Array[String]
>>
>> I would want to do something like
>>
>> df.groupBy('id).agg(merge_arrays('words)) -> list of all words
>> df.groupBy('id).agg(merge_sets('words)) -> list of distinct words
>>
>> Thanks,
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> UC Berkeley AMPLab Alumni
>>
>> ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Reply via email to