What is mapred.child.ulimit set to? This configuration options specifics how much memory child processes are allowed to have. You may want to up this limit and see what happens.
Let me know if that doesn't get you anywhere. Alex On Wed, Jun 10, 2009 at 9:40 AM, Scott <[email protected]> wrote: > Complete newby map/reduce question here. I am using hadoop streaming as I > come from a Perl background, and am trying to prototype/test a process to > load/clean-up ad server log lines from multiple input files into one large > file on the hdfs that can then be used as the source of a hive db table. > I have a perl map script that reads an input line from stdin, does the > needed cleanup/manipulation, and writes back to stdout. I don't really > need a reduce step, as I don't care what order the lines are written in, and > there is no summary data to produce. When I run the job with -reducer NONE > I get valid output, however I get multiple part-xxxxx files rather than one > big file. > So I wrote a trivial 'reduce' script that reads from stdin and simply > splits the key/value, and writes the value back to stdout. > > I am executing the code as follows: > > ./hadoop jar ../contrib/streaming/hadoop-0.19.1-streaming.jar -mapper > "/usr/bin/perl /home/hadoop/scripts/map_parse_log_r2.pl" -reducer > "/usr/bin/perl /home/hadoop/scripts/reduce_parse_log.pl" -input /logs/*.log > -output test9 > > The code I have works when given a small set of input files. However, I > get the following error when attempting to run the code on a large set of > input files: > > hadoop-hadoop-jobtracker-testdw0b00.log.2009-06-09:2009-06-09 15:43:00,905 > WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node > tracker_testdw0b00:localhost.localdomain/127.0.0.1:53245 has 2004049920 > bytes free; but we expect reduce input to take 22138478392 > > I assume this is because the all the map output is being buffered in memory > prior to running the reduce step? If so, what can I change to stop the > buffering? I just need the map output to go directly to one large file. > > Thanks, > Scott > >
