Hi guys, Not sure if you run into this problem. We have 6 nodes cluster with CDH5. It takes about 8 hours to process 80MB compressed files (in .deflate format), while it is much faster (less than 1 hour) to process the uncompressed files. I think there must be something wrong with my settings. Any help ? Thank you.
We are using an XMLInputFormat as InputFormat and a customized SerDe to read the XML records. Cheers, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108