Hi,
We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So
there are lots of directories like these:
/my/logs/2013-03-08/01/000000_0 /my/logs/2013-03-08/02/000000_0
/my/logs/2013-03-08/03/000000_0
...
Now we want to create external table to query the log data. So we use the "Add
Partition".
CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION
'/my/logs/2013-03-08/01';
This works fine. However if we want say one week worth of logs, then we need to
repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid
specifying Partition statements so many times, maybe something like wildcard
"2013-03-08/*"? If not, what's the general practice to handle these hourly logs?
Thanks