Hi

INPUT
=====
Hive can handle gz files out of the box with NO additional configurations

OUTPUT
======
If you want Hive to output to compressed files (say gz) then add the following 
as part of the hive SQL at the begining
SET hive.exec.compress.output=true;
SET mapred.reduce.tasks=16;    // this will create max 16 gzip files as part of 
your Hive output query
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;


SideNote (may or may not be relevant to u ….nevertheless)
You may know that GZIP is not splittable and unless u have a definite reason to 
use GZIP (like multiple lines in a log file actually constitute one logical 
Object or Record) , I would recommend LZO…
A little bit of plumbing is required since they discontinued LZO with Hadoop 
out of the box…..but its pretty straight forward….and remember to use the LZO 
indexer to create an index for your output so that the LZO files can be split 
going fwd


Thanks

sanjay

From: Panshul Whisper <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, May 2, 2013 6:00 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: external table or gz compressed file

Hello,

Can somebody please explain me or point me in the right direction for :
how Hive handles gz compressed files, If I create an external table pointing to 
a .gz compressed file stored on AWS S3.
Does hive copy the file to the HDFS and decompress it before it uses the file?
OR does it use the file directly?
If we use a decompressed file stored on S3... does hive still copy the file to 
HDFS or read records directly from S3?

Please help me understand the working.

Thanking You,

--
Regards,
Ouch Whisper
010101010101

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Reply via email to