I downloaded the data to a file and checked. Like you said, there was a space after the '3'. Removing that space gives the exact, expected result. In fact, I do have these artifacts in the 'real' data and will have to treat them first.
Thanks a lot, Matt! I've been trying to crack this problem for hours. From: matt.tuc...@disney.com To: user@hive.apache.org Date: Wed, 28 Mar 2012 14:46:59 -0400 Subject: RE: Help in aggregating comma separated values Are you sure that the values in the ID column are the same? If it’s a string datatype, do you have any spaces after the ‘3’ but before the tab? Matt Tucker From: Saurabh S [mailto:saurab...@live.com] Sent: Wednesday, March 28, 2012 2:45 PM To: user@hive.apache.org Subject: RE: Help in aggregating comma separated values Thanks for the reply, Matt. This is exactly what I'm looking for. I'll look into the explode function more deeply. Another question: For that example table, your query generates a slightly different different result. The values for id 3 are not aggregated. So the result is as follows. Any idea what's happening?id value cnt 1 10 1 1 20 1 1 30 1 2 20 2 2 30 1 3 20 1 3 30 1 3 40 1 3 20 1 3 30 1 3 40 1 > From: matt.tuc...@disney.com > To: user@hive.apache.org > Date: Wed, 28 Mar 2012 14:25:43 -0400 > Subject: RE: Help in aggregating comma separated values > > Hi, > > The values column needs to be split across multiple rows, so you can then > aggregate them. Try this: > > SELECT id, value, COUNT(*) cnt > FROM table1 LATERAL VIEW explode(split(values, ",")) values_tbl as value > GROUP BY id, value > > > > Matt Tucker > > -----Original Message----- > From: Saurabh S [mailto:saurab...@live.com] > Sent: Wednesday, March 28, 2012 2:21 PM > To: user@hive.apache.org > Subject: Help in aggregating comma separated values > > > Hello, > > How do I get count from a list of comma separated values? For the lack of > better wording, here is an example: > > Suppose there is a table with two columns, id (integers) and values (string) > in the following fashion. > > id values > 1 10,20,30 > 2 20,30 > 2 20 > 3 20,30,40 > 3 40,20,30 > > I want an output like this: > > id value count > 1 10 1 > 1 20 1 > 1 30 1 > 2 20 2 > 2 30 1 > 3 20 2 > 3 30 2 > 3 40 2 > > So, it's a list containing the values those appear against each id, along > with their count. > > I understand this is more of a SQL kind of question rather than one specific > to Hive, but I'm at a roadblock here. > > Thanks, > Saurabh >