1. We use LZO compression in our MR jobs that create LZO files (these are NOT 
sequence files)  that are the feeder files for Hive
2. Then we we use Hive data (LZO files) and run aggregation reports

Hope this helps
Good luck
sanjay


From: "Ravi Mummulla (BIG DATA)" 
<rav...@microsoft.com<mailto:rav...@microsoft.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Monday, June 10, 2013 6:14 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: RE: Compression in Hive

Documentation is here 
https://cwiki.apache.org/confluence/display/Hive/CompressedStorage. Performance 
overhead is trivial for larger amounts of data but may be magnified as data 
size gets smaller. Typically where you gain is data transfers between nodes and 
disk reads/writes. Again, the larger the data size the more the gain.

Thanks.

From: Sachin Sudarshana [mailto:sachin.had...@gmail.com]
Sent: Sunday, June 9, 2013 11:04 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Compression in Hive

Hi,

I have been testing the usefulness of compression in Hive. I have a general 
question,

I would like to know if there are any particular cases where compression in 
hive can actually prove useful while running any MR jobs.

Any pointers/examples would really be useful!

Thank you,
Sachin


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Reply via email to