Hello, I am calculating certain statistics on an hourly basis. At the end of the day, I would like to calculate a daily log.
Each hour, the log file in downloaded from the server, and processed using an external java program. Here is some sample data from item_hourly. hive> select * from item_hourly; OK 131 1 2011021002 151 1 2011021002 133 1 2011021007 137 1 2011021007 151 1 2011021007 46 1 2011021010 75 1 2011021010 136 7 2011021011 137 7 2011021011 140 1 2011021011 Each time the batch runs, it creates a partition with an hour key, such has 2011021002 - and everything is partitioned in similar fashion. Now, I would like to calculate daily stats, for a group of partitions, and insert into another table, with a different partition that denotes day, instead of day_hour so basically 2011021002 would be shortened to 20110210 and so on. I am falling apart in two places: a. i can not query like where partition_name = 2011021002*, I would have to do like partition_name = '2011021002' or partion_name = '2011021003' which would require me knowing the partitions that I processed that day. b. I dont know how to calculate the name of the new partiton, like: 20110210, without resorting to an external program. Any ideas greatly appreciated, Best Regards, -Cam Bazz