I am not super familiar with lists inside a column for Hive, but that might let you define a table that has a schema of "page-type, page-name, items-displayed", and then query for a count of individual items ( http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL and http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF). Possibly use of a Map type would be best.. not sure.
HTH Dave Viner On Tue, Mar 1, 2011 at 4:33 AM, Cam Bazz <camb...@gmail.com> wrote: > Hello, > > Now I would like to count impressions per item. To achieve this, I > made a logger, for instance when the user goes in a category or search > page, and some items are listed, I am logging: > > CATPAGE CAT1 1,2,3,4,5 > CATPAGE CAT2 6,7,8,9,10 > SEARCH keyword 1,6 > > > basically I am logging all the displayed items in a comma seperated list. > > I need to calculate and store daily impressions from this such as: > > 1, 2 > 6, 2 > > (the first line is item sid, the second number is impressions, in > total from different impression types) > > Now I have couple of questions: > > considering that the system will produce at least 1 line per item per > day, what kind of table i must store this? previously, I have been > using text files for everything, I never had any requirement to query > hive, but rather export results from it. now I will probably need to > make queries like "select * from myimpression table where sid = xx" > giving me a timeline of impressions per item. > > Second question: > > what kind of query I need in order to count impressions like above? > > Thank you very much, > C.B. >