1. We use LZO compression in our MR jobs that create LZO files (these are NOT sequence files) that are the feeder files for Hive 2. Then we we use Hive data (LZO files) and run aggregation reports
Hope this helps Good luck sanjay From: "Ravi Mummulla (BIG DATA)" <rav...@microsoft.com<mailto:rav...@microsoft.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Monday, June 10, 2013 6:14 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: RE: Compression in Hive Documentation is here https://cwiki.apache.org/confluence/display/Hive/CompressedStorage. Performance overhead is trivial for larger amounts of data but may be magnified as data size gets smaller. Typically where you gain is data transfers between nodes and disk reads/writes. Again, the larger the data size the more the gain. Thanks. From: Sachin Sudarshana [mailto:sachin.had...@gmail.com] Sent: Sunday, June 9, 2013 11:04 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Compression in Hive Hi, I have been testing the usefulness of compression in Hive. I have a general question, I would like to know if there are any particular cases where compression in hive can actually prove useful while running any MR jobs. Any pointers/examples would really be useful! Thank you, Sachin CONFIDENTIALITY NOTICE ====================== This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.