After a node goes down, I can't run jobs

Foss User Sun, 05 Apr 2009 02:48:57 -0700

I have a Hadoop cluster of 5 nodes: (1) Namenode (2) Job tracker (3)
First slave (4) Second Slave (5) Client from where I submit jobs


I brought system no. 4 down by running:

bin/hadoop-daemon.sh stop datanode
bin/hadoop-daemon.sh stop tasktracker

After this I tried running my word count job again and I got this error:

foss...@hadoop-client:~/mcr-wordcount$ hadoop jar
dist/mcr-wordcount-0.1.jar com.fossist.examples.WordCountJob
/fossist/inputs /fossist/output7                       09/04/05
15:13:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
09/04/05 15:13:03 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.1.5:50010
09/04/05 15:13:03 INFO hdfs.DFSClient: Abandoning block
blk_-6478273736277251749_1034
09/04/05 15:13:09 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused
09/04/05 15:13:09 INFO hdfs.DFSClient: Abandoning block
blk_-7054779688981181941_1034
09/04/05 15:13:15 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused
09/04/05 15:13:15 INFO hdfs.DFSClient: Abandoning block
blk_-6231549606860519001_1034
09/04/05 15:13:21 INFO hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.1.5:50010
09/04/05 15:13:21 INFO hdfs.DFSClient: Abandoning block
blk_-7060117896593271410_1034
09/04/05 15:13:27 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2722)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

09/04/05 15:13:27 WARN hdfs.DFSClient: Error Recovery for block
blk_-7060117896593271410_1034 bad datanode[1] nodes == null
09/04/05 15:13:27 WARN hdfs.DFSClient: Could not get block locations.
Source file "/tmp/hadoop-hadoop/mapred/system/job_200904042051_0011/job.jar"
- Aborting...
java.io.IOException: Bad connect ack with firstBadLink 192.168.1.5:50010
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2780)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2703)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

Note that 192.168.1.5 is the Hadoop slave where I stopped datanode and
tasktracker. This is a serious concern for me because if I am unable
to run jobs after a certain node goes down, then the purpose of the
cluster is defeated.

Could someone please help me in understanding whether it is a human
error by me or it is a problem in Hadoop? Is there any way to avoid
this?

Please note that I can still read all my data in 'inputs' directory
using the commands like:

foss...@hadoop-client:~/mcr-wordcount$ hadoop dfs -cat
/fossist/inputs/input1.txt

Please help.

After a node goes down, I can't run jobs

Reply via email to