I downloaded the data to a file and checked. Like you said, there was a space 
after the '3'. Removing that space gives the exact, expected result. In fact, I 
do have these artifacts in the 'real' data and will have to treat them first.

Thanks a lot, Matt! I've been trying to crack this problem for hours.

From: matt.tuc...@disney.com
To: user@hive.apache.org
Date: Wed, 28 Mar 2012 14:46:59 -0400
Subject: RE: Help in aggregating comma separated values



Are you sure that the values in the ID column are the same? If it’s a string 
datatype, do you have any spaces after the ‘3’ but before the tab? Matt Tucker 
From: Saurabh S [mailto:saurab...@live.com] 
Sent: Wednesday, March 28, 2012 2:45 PM
To: user@hive.apache.org
Subject: RE: Help in aggregating comma separated values Thanks for the reply, 
Matt. This is exactly what I'm looking for. I'll look into the explode function 
more deeply.

Another question: For that example table, your query generates a slightly 
different different result. The values for id 3 are not aggregated. So the 
result is as follows. Any idea what's happening?id   value cnt
1    10    1
1    20    1
1    30    1
2    20    2
2    30    1
3    20    1
3    30    1
3    40    1
3    20    1
3    30    1
3    40    1 > From: matt.tuc...@disney.com
> To: user@hive.apache.org
> Date: Wed, 28 Mar 2012 14:25:43 -0400
> Subject: RE: Help in aggregating comma separated values
> 
> Hi,
> 
> The values column needs to be split across multiple rows, so you can then 
> aggregate them. Try this:
> 
> SELECT id, value, COUNT(*) cnt
> FROM table1 LATERAL VIEW explode(split(values, ",")) values_tbl as value
> GROUP BY id, value
> 
> 
> 
> Matt Tucker
> 
> -----Original Message-----
> From: Saurabh S [mailto:saurab...@live.com] 
> Sent: Wednesday, March 28, 2012 2:21 PM
> To: user@hive.apache.org
> Subject: Help in aggregating comma separated values
> 
> 
> Hello,
> 
> How do I get count from a list of comma separated values? For the lack of 
> better wording, here is an example:
> 
> Suppose there is a table with two columns, id (integers) and values (string) 
> in the following fashion.
> 
> id   values
> 1    10,20,30
> 2    20,30
> 2    20
> 3    20,30,40
> 3    40,20,30
> 
> I want an output like this:
> 
> id   value count
> 1    10    1
> 1    20    1
> 1    30    1
> 2    20    2
> 2    30    1
> 3    20    2
> 3    30    2
> 3    40    2
> 
> So, it's a list containing the values those appear against each id, along 
> with their count.
> 
> I understand this is more of a SQL kind of question rather than one specific 
> to Hive, but I'm at a roadblock here.
> 
> Thanks,
> Saurabh
>                                         

Reply via email to