That's a drawback of external tables, but it's actually not as difficult as it sounds. It's easy to write a nightly "cron" job that creates the partition for the next day (or a job per month...), if someone on your team has some bash experience. Other job scheduling tools should support this too. Here's an example. First, a hive script that uses parameters for the date (Hive v0.8 or newer):
-- addlogpartition.hql ALTER TABLE log ADD IF NOT EXISTS PARTITION (year = ${YEAR}, month = ${MONTH}, day = ${DAY}); Then, run this bash script AFTER MIDNIGHT: #!/bin/bash YEAR=$(date +%Y) # returns the string "2013" today. MONTH=$(date +%m) # returns the string "03" today, with the leading zero. DAY=$(date +%d) # returns the string "29" today. Will prefix with 0 for dates < 10. # Assumes /path/to/2013/03/29 is the correct directory name: /path/to/hive -f /path/to/addlogpartition.hql -d YEAR=$YEAR -d MON=$MONTH -d DAY=$DAY (Of course, all the /path/to will be different...) So, be careful of how how "03" vs. "3" is handled in both the ALTER TABLE statement and the path. Off hand, I don't know if Hive will complain if you use 03 as an integer value in the ALTER TABLE statement. On Fri, Mar 29, 2013 at 1:16 PM, Mark <static.void....@gmail.com> wrote: > Thanks > > Does this mean I need to create a partition for each day manually? There > is no way to have infer that from my directory structure? > > On Mar 29, 2013, at 10:40 AM, Sanjay Subramanian < > sanjay.subraman...@wizecommerce.com> wrote: > > > Hi > > > > CREATE EXTERNAL TABLE IF NOT EXISTS log_data(col1 datatype1, col2 > > datatype2, . . . colN datatypeN) PARTITIONED BY (YEAR INT, MONTH INT, DAY > > INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > > > > > > ALTER table log_data ADD PARTITION (YEAR=2013 , MONTH=2, DAY=27) LOCATION > > '/path/to/YEAR/MONTH/DAY/directory/ON/HDFS';" > > > > Hive will read gzip and bz2 files out of the box.(so suppose you had > > hourly log files in gzip format in your /YEAR/MONTH/DAY directory then it > > will be read) > > Snappy and LZO will need some jar installs and configs > > https://github.com/toddlipcon/hadoop-lzo > > > > https://code.google.com/p/snappy/ > > > > > > Note that for example - gzip format is not splittable..so huge gzip files > > without splits are not recommended as input to maps > > > > Hope this helps > > > > sanjay > > > > > > On 3/29/13 10:19 AM, "Mark" <static.void....@gmail.com> wrote: > > > >> We have existing log data in directories in the format of > YEAR/MONTH/DAY. > >> > >> - How can we create a table over this table without hive modifying > and/or > >> moving it? > >> - How can we tell Hive to partition this data so it knows about each day > >> of logs? > >> - Does hive out of the box work with reading compressed files? > >> > >> Thanks > > > > > > CONFIDENTIALITY NOTICE > > ====================== > > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330