Re: Group By Concatenation

2012-11-01 Thread Edward Capriolo
Collect_set() is built into hive. If you want a version that does not de-duplicate look here. https://github.com/edwardcapriolo/hive-collect Caution both of these functions can produce out of memory if the results are later then a mapper can store in memory. On Thu, Nov 1, 2012 at 2:27 PM, Ratner

Group By Concatenation

2012-11-01 Thread Ratner, Alan S (IS)
Sorry to ask what is probably a very naïve Hive question but here goes: I have a table as follows: Col1 Col2 K1 V1 K1 V1 K2 V1 K3 V1 K1 V2 K1 V3 K2 V2 Now I have managed to SELECT Col1,COUNT(DISTINCT Col2) FROM ... BY COL1; to obtain K1 3 K2