You will need to define a partition column like date or hour something like this. Then configure flume to rollover filee/directories based on your partition column. You will need some kind of cron which will check for the new data being available into a directory or file and then add it as partition to the table
(Looks easy but fairly complex) Other approach, write into a single file of a table. Then create another partitioned table and then select from base table with dynamic partitions enabled, write into new table. (This will be little bad as you will always need to reprocess all the data or limit data with where clause and adding to particular partition only ) On Fri, Sep 13, 2013 at 7:25 AM, ch huang <justlo...@gmail.com> wrote: > hi,all: > i use flume collect log data and put it in hdfs ,i want to use > hive to do some caculate, query based on timerange,i want to use parttion > table , > but the data file in hdfs is a big file ,how can i put it into pratition > table in hive? > -- Nitin Pawar