It could be the serde that is slow and not the compression ?If your input XML is in multiline records then u may wanna write a bit of RecordReader code to process the multiline XML yourself, just to see if it makes any changes to the processing speed ?https://github.com/sanjaysubramanian/big_data_latte/tree/master/src/main/java/org/medicalsidefx/bdo/logparsers/multiline
From: Yan Fang <yanfang...@gmail.com> To: user@hive.apache.org Sent: Wednesday, October 22, 2014 8:06 AM Subject: It's extremely slow when hive reads compression files Hi guys, Not sure if you run into this problem. We have 6 nodes cluster with CDH5. It takes about 8 hours to process 80MB compressed files (in .deflate format), while it is much faster (less than 1 hour) to process the uncompressed files. I think there must be something wrong with my settings. Any help ? Thank you. We are using an XMLInputFormat as InputFormat and a customized SerDe to read the XML records. Cheers, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108