I am running 0.19.1-dev, r744282. I have searched the issues but found nothing about the compression. Should the intermediate results not be compressed also if the map output files are set to be compressed? If not then why do we have the map compression option just to save network traffic?
I am running a large streaming job that processes that about 3TB of data I
am seeing large jumps in hard drive space usage in the reduce part of the
jobs I tracked the problem down. The job is set to compress map outputs but
looking at the intermediate files on the local drives the intermediate files
are not getting compressed during/after merges. I am going from having say
2Gb of mapfile.out files to having one intermediate.X file sizing 100-350%
larger then the map files. I have looked at one of the files and confirmed
that it is not getting compressed as I can read the data in it. if it was
only one merge then it would not be a problem but when you are merging
70-100 of these you use tons of GB's and my task are starting to die as they
run out of hard drive space end the end kill the job.
- intermediate results not getting compressed Billy Pearson
- Re: intermediate results not getting compressed Chris Douglas
- Re: intermediate results not getting compressed Billy Pearson
- Re: intermediate results not getting compressed Chris Douglas
- Re: intermediate results not getting compre... Billy Pearson
- Re: intermediate results not getting compre... Billy Pearson
- Re: intermediate results not getting c... Billy Pearson
- Re: intermediate results not getti... Stefan Will
- Re: intermediate results not g... Billy Pearson
- Re: intermediate results not g... Billy Pearson
- Re: intermediate results not getting compre... Billy Pearson