I understand that I got CompressMapOutput set and it works the maps outputs are compressed but on the reduce end it downloads x files then merges the x file in to one intermediate file to keep the number of files to a minimal <= io.sort.factor.

My problem is the output from merging the intermediate map output files is not compresses so I lose all the benefit of compressing the map file output to save disk space because the merged map output files are no longer compressed.

Note there are two different type of intermediate files the map outputs then one the reduce merges the map outputs to meet the set io.sort.factor.

Billy



----- Original Message ----- From: "Chris Douglas" <chrisdo-ZXvpkYn067l8UrSeD/g...@public.gmane.org>
Newsgroups: gmane.comp.jakarta.lucene.hadoop.user
To: <core-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/jbr...@public.gmane.org>
Sent: Tuesday, March 17, 2009 12:33 AM
Subject: Re: intermediate results not getting compressed


I am running 0.19.1-dev, r744282. I have searched the issues but found nothing about the compression.

AFAIK, there are no open issues that prevent intermediate compression from working. The following might be useful:

http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Data+Compression

Should the intermediate results not be compressed also if the map output files are set to be compressed?

These are controlled by separate options.

FileOutputFormat::setCompressOutput enables/disables compression on the final output JobConf::setCompressMapOutput enables/disables compression of the intermediate output

If not then why do we have the map compression option just to save network traffic?

That's part of it. Also to save on disk bandwidth and intermediate space. -C



Reply via email to