Robert, My mapper job fails. I am basically trying to run a crawler on hadoop and hadoop kills the crawler (mapper) if it has not heard from it for a certain timeout period. But I already have a timeout set in my mapper(500 seconds) which is lesser than hadoop's timeout(900 seconds). The mapper just stalls for some reason. My mapper code is as follows:
while read line;do result="`wget -O - --timeout=500 http://$line 2>&1`" echo $result done Any idea why my mapper is getting stalled ? I don't see the difference between the command you have given and the one I ran. I am not running in local mode. Is there some way by which I can get intermediate mapper outputs ? I would like to see for which site the mapper is getting stalled. Thanks, Aishwarya On Thu, Oct 6, 2011 at 1:41 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > Alshwarya, > > Are you running in local mode? If not you probably want to run > > hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file > ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output > > You may also want to run hadoop fs -ls output/* to see what files were > produced. If your mappers failed for some reason then there will be no > files in the output directory. And you may want to look at the stderr logs > for your processes through the web UI. > > --Bobby Evans > > On 10/6/11 3:30 PM, "Aishwarya Venkataraman" <avenk...@cs.ucsd.edu> wrote: > > I ran the following (I am using IdentityReducer) : > > ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file > ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output > > When I do > ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I > view the output of mapper ? > > Thanks, > AIshwarya > > On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > > > A streaming jobs stderr is logged for the task, but its stdout is what is > > sent to the reducer. The simplest way to get it is to turn off the > > reducers, and then look at the output in HDFS. > > > > --Bobby Evans > > > > On 10/6/11 1:16 PM, "Aishwarya Venkataraman" <avenk...@cs.ucsd.edu> > wrote: > > > > Hello, > > > > I want to view the mapper output for a given hadoop streaming jobs (that > > runs a shell script). However I am not able to find this in any log > files. > > Where should I look for this ? > > > > Thanks, > > Aishwarya > > > > > > > -- > Thanks, > Aishwarya Venkataraman > avenk...@cs.ucsd.edu > Graduate Student | Department of Computer Science > University of California, San Diego > > -- Thanks, Aishwarya Venkataraman avenk...@cs.ucsd.edu Graduate Student | Department of Computer Science University of California, San Diego