Thanks

Does this mean I need to create a partition for each day manually? There is no 
way to have infer that from my directory structure?

On Mar 29, 2013, at 10:40 AM, Sanjay Subramanian 
<sanjay.subraman...@wizecommerce.com> wrote:

> Hi
> 
> CREATE EXTERNAL TABLE IF NOT EXISTS log_data(col1 datatype1, col2
> datatype2, . . . colN datatypeN) PARTITIONED BY (YEAR INT, MONTH INT, DAY
> INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
> 
> 
> ALTER table log_data ADD PARTITION (YEAR=2013 , MONTH=2, DAY=27) LOCATION
> '/path/to/YEAR/MONTH/DAY/directory/ON/HDFS';"
> 
> Hive will read gzip and bz2 files out of the box.(so suppose you had
> hourly log files in gzip format in your /YEAR/MONTH/DAY directory then it
> will be read)
> Snappy and LZO will need some jar installs and configs
> https://github.com/toddlipcon/hadoop-lzo
> 
> https://code.google.com/p/snappy/
> 
> 
> Note that for example - gzip format is not splittable..so huge gzip files
> without splits are not recommended as input to maps
> 
> Hope this helps
> 
> sanjay
> 
> 
> On 3/29/13 10:19 AM, "Mark" <static.void....@gmail.com> wrote:
> 
>> We have existing log data in directories in the format of YEAR/MONTH/DAY.
>> 
>> - How can we create a table over this table without hive modifying and/or
>> moving it?
>> - How can we tell Hive to partition this data so it knows about each day
>> of logs?
>> - Does hive out of the box work with reading compressed files?
>> 
>> Thanks
> 
> 
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution is 
> prohibited. If you are not the intended recipient, please contact the sender 
> by reply email and destroy all copies of the original message along with any 
> attachments, from your computer system. If you are the intended recipient, 
> please be advised that the content of this message is subject to access, 
> review and disclosure by the sender's Email System Administrator.
> 

Reply via email to