We have a m/r job which writes CSV files ranging from 10MB to 200MB. To make 
HIVE queries efficient on these files we are planning to combine small files. 
However, it looks like HIVE itself can combine small files [1]. I noticed that 
when these CSV files are compressed using SNAPPY, HIVE is not able to combine 
the small files. I am assuming since SNAPPY is not splittable for text data, 
HIVE is unable to combine these files and I am hoping when we use a splittable 
compression, HIVE will be able to combine small files. Can someone please 
confirm that CombineFileInputFormat does not work with compression which is not 
splittable?

[1] 
https://issues.apache.org/jira/browse/<https://issues.apache.org/jira/browse/HIVE-74>HIVE-74<https://issues.apache.org/jira/browse/HIVE-74>

Surbhi Mungre
Software Engineer
www.cerner.com<http://www.cerner.com/>

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to