It could be the serde that is slow and not the compression ?If your input XML 
is in multiline records then u may wanna write a bit of RecordReader code to 
process the multiline XML yourself, just to see if it makes any changes to the 
processing speed 
?https://github.com/sanjaysubramanian/big_data_latte/tree/master/src/main/java/org/medicalsidefx/bdo/logparsers/multiline

      From: Yan Fang <yanfang...@gmail.com>
 To: user@hive.apache.org 
 Sent: Wednesday, October 22, 2014 8:06 AM
 Subject: It's extremely slow when hive reads compression files
   
Hi guys,
Not sure if you run into this problem. We have 6 nodes cluster with CDH5. It 
takes about 8 hours to process 80MB compressed files (in .deflate format), 
while it is much faster (less than 1 hour) to process the uncompressed files. I 
think there must be something wrong with my settings. Any help ? Thank you.
We are using an XMLInputFormat as InputFormat and a customized SerDe to read 
the XML records.
Cheers,
Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108

   

Reply via email to