I am not super familiar with lists inside a column for Hive, but that might
let you define a table that has a schema of "page-type, page-name,
items-displayed", and then query for a count of individual items (
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL and
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF).  Possibly use of a
Map type would be best.. not sure.

HTH
Dave Viner


On Tue, Mar 1, 2011 at 4:33 AM, Cam Bazz <camb...@gmail.com> wrote:

> Hello,
>
> Now I would like to count impressions per item. To achieve this, I
> made a logger, for instance when the user goes in a category or search
> page, and some items are listed, I am logging:
>
> CATPAGE   CAT1    1,2,3,4,5
> CATPAGE   CAT2    6,7,8,9,10
> SEARCH     keyword 1,6
>
>
> basically I am logging all the displayed items in a comma seperated list.
>
> I need to calculate and store daily impressions from this such as:
>
> 1, 2
> 6, 2
>
> (the first line is item sid, the second number is impressions, in
> total from different impression types)
>
> Now I have couple of questions:
>
> considering that the system will produce at least 1 line per item per
> day, what kind of table i must store this? previously, I have been
> using text files for everything, I never had any requirement to query
> hive, but rather export results from it. now I will probably need to
> make queries like "select * from myimpression table where sid = xx"
> giving me a timeline of impressions per item.
>
> Second question:
>
> what kind of query I need in order to count impressions like above?
>
> Thank you very much,
> C.B.
>

Reply via email to