Is collect_set what you are looking for? I havent used it myself, but it
seems to remove the duplicates..

http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Built-in_Aggregate_Functions_.28UDAF.29

Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Fri, Feb 11, 2011 at 9:43 AM, Tim Robertson <timrobertson...@gmail.com>wrote:

> Hi all,
>
> Sorry if I am missing something obvious but is there an inverse of an
> explode?
>
> E.g. given t1
>
> ID Name
> 1  Tim
> 2  Tim
> 3  Tom
> 4  Frank
> 5  Tim
>
> Can you create t2:
>
> Name ID
> Tim    1,2,5
> Tom   3
> Frank 4
>
> In Oracle it would be a
>  select name,collect(id) from t1 group by name
>
> I suspect in Hive it is related to an Array but can't find the syntax
>
> Thanks for any pointers,
> Tim
>

Reply via email to