It may be that the sun grid is similar to the EC2 and the machines have an internal IPaddress/name that MUST be used for inter machine communication and an external IPaddress/name that is only for internet access.
The above overly complex sentence basically states there may be some firewall rules/tools in the sun grid that you need to be aware of and use. On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang < [email protected]> wrote: > Hi, Jason, > > Thanks for your advice, after insert port into the file of > "hadoop-site.xml", I can start namenode and run job now. > But my system works only when I set localhost to masters and add localhost > (as well as some other nodes) to slavers file. And all the tasks are > Data-local map tasks. I wonder if whether I enter fully distributed mode, or > still in pseudo mode. > > As for the SGE, I am only a user and know little about it. This is the user > manual of our cluster: > http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc<http://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc> > > Best, > Jasmine > > ----- Original Message ----- From: "jason hadoop" <[email protected]> > To: <[email protected]> > Sent: Sunday, April 26, 2009 12:06 AM > Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid > Engine > > > > the parameter you specify for fs.default name should be of the form >> hdfs://host:port and the parameter you specify for the mapred.job.tracker >> MUST be host:port. I haven't looked at 18.3, but it appears that the >> :port >> is mandatory. >> >> In your case, the piece of code parsing the fs.default.name variable is >> not >> able to tokenize it into protocol host and port correctly >> >> recap: >> fs.default.name hdfs://namenodeHost:port >> mapred.job.tracker jobtrackerHost:port >> sepecify all the parts above and try again. >> >> Can you please point me at information on using the sun grid, I want to >> include a paragraph or two about it in my book. >> >> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang < >> [email protected]> wrote: >> >> Hi, there, >>> >>> My hadoop system (version: 0.18.3) works well under standalone and >>> pseudo-distributed operation. But if I try to run hadoop in >>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in >>> fact, >>> the jobTracker and TaskzTracker can be started, but the namenode and >>> secondary namenode cannot be started. Could anyone help me with it? >>> >>> My SGE scripts looks like: >>> >>> #!/bin/bash >>> #$ -cwd >>> #$ -S /bin/bash >>> #$ -l long=TRUE >>> #$ -v JAVA_HOME=/usr/java/latest >>> #$ -v HADOOP_HOME=********* >>> #$ -pe hadoop 6 >>> PATH="$HADOOP_HOME/bin:$PATH" >>> hadoop fs -put ******** >>> hadoop jar ***** >>> hadoop fs -get ********* >>> >>> Then the output looks like: >>> Exception in thread "main" java.lang.NumberFormatException: For input >>> string: "" >>> at >>> java.lang.NumberFormatException.forInputString(NumberFormatException. >>> java:48) >>> at java.lang.Integer.parseInt(Integer.java:468) >>> at java.lang.Integer.parseInt(Integer.java:497) >>> at >>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144) >>> at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116) >>> at >>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil >>> eSystem.java:66) >>> at >>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339 >>> ) >>> at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) >>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351) >>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213) >>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118) >>> at org.apache.hadoop.fs.FsShell.init(FsShell.java:88) >>> at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>> at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852) >>> >>> And the log of NameNode looks like >>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG: >>> /************************************************************ >>> STARTUP_MSG: Starting NameNode >>> STARTUP_MSG: host = ************ >>> STARTUP_MSG: args = [] >>> STARTUP_MSG: version = 0.18.3 >>> ************************************************************/ >>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode: >>> java.lang.NumberFormatException: For i >>> nput string: "" >>> at >>> >>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) >>> at java.lang.Integer.parseInt(Integer.java:468) >>> at java.lang.Integer.parseInt(Integer.java:497) >>> at >>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144) >>> at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116) >>> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136) >>> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193) >>> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179) >>> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830) >>> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839) >>> >>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode: >>> SHUTDOWN_MSG: >>> /************************************************************ >>> SHUTDOWN_MSG: Shutting down NameNode at *************** >>> >>> Best, >>> Jasmine >>> >>> >>> >> >> -- >> Alpha Chapters of my book on Hadoop are available >> http://www.apress.com/book/view/9781430219422 >> >> > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
