Hello,

I am calculating certain statistics on an hourly basis. At the end of
the day, I would like to calculate a daily log.

Each hour, the log file in downloaded from the server, and processed
using an external java program. Here is some sample data from
item_hourly.

hive> select * from item_hourly;
OK
131     1       2011021002
151     1       2011021002
133     1       2011021007
137     1       2011021007
151     1       2011021007
46      1       2011021010
75      1       2011021010
136     7       2011021011
137     7       2011021011
140     1       2011021011

Each time the batch runs, it creates a partition with an hour key,
such has 2011021002 - and everything is partitioned in similar
fashion.

Now, I would like to calculate daily stats, for a group of partitions,
and insert into another table, with a different partition that denotes
day, instead of day_hour

so basically 2011021002 would be shortened to 20110210 and so on.

I am falling apart in two places:

a. i can not query like where partition_name = 2011021002*, I would
have to do like partition_name = '2011021002' or partion_name =
'2011021003'

which would require me knowing the partitions that I processed that day.

b. I dont know how to calculate the name of the new partiton, like:
20110210, without resorting to an external program.

Any ideas greatly appreciated,
Best Regards,
-Cam Bazz

Reply via email to