The difference in the command is where the shell script is coming from. If you
are using ~/mapper.sh then it will look in your home directory to run the
script. If you have a small cluster with your home directory mounted on all of
them then it is not that big of a deal. If you have a large c
Robert,
My mapper job fails. I am basically trying to run a crawler on hadoop and
hadoop kills the crawler (mapper) if it has not heard from it for a certain
timeout period. But I already have a timeout set in my mapper(500 seconds)
which is lesser than hadoop's timeout(900 seconds). The mapper ju
Alshwarya,
Are you running in local mode? If not you probably want to run
hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file ~/mapper.sh
-mapper ./mapper.sh -input ../foo.txt -output output
You may also want to run hadoop fs -ls output/* to see what files were
produced. If you
I ran the following (I am using IdentityReducer) :
./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output
When I do
./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
view the output of mapper ?
A streaming jobs stderr is logged for the task, but its stdout is what is sent
to the reducer. The simplest way to get it is to turn off the reducers, and
then look at the output in HDFS.
--Bobby Evans
On 10/6/11 1:16 PM, "Aishwarya Venkataraman" wrote:
Hello,
I want to view the mapper outp
Hello,
I want to view the mapper output for a given hadoop streaming jobs (that
runs a shell script). However I am not able to find this in any log files.
Where should I look for this ?
Thanks,
Aishwarya