Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

jason hadoop Sun, 26 Apr 2009 09:13:44 -0700

It may be that the sun grid is similar to the EC2 and the machines have an
internal IPaddress/name that MUST be used for inter machine communication
and an external IPaddress/name that is only for internet access.


The above overly complex sentence basically states there may be some
firewall rules/tools in the sun grid that you need to be aware of and use.

On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang <
[email protected]> wrote:

> Hi, Jason,
>
> Thanks for your advice, after insert port into the file of
> "hadoop-site.xml", I can start namenode and run job now.
> But my system works only when I  set localhost to masters and add localhost
> (as well as some other nodes) to slavers file. And all the tasks are
> Data-local map tasks. I wonder if whether I enter fully distributed mode, or
> still in pseudo mode.
>
> As for the SGE, I am only a user and know little about it. This is the user
> manual of our cluster:
> http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc<http://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc>
>
> Best,
> Jasmine
>
> ----- Original Message ----- From: "jason hadoop" <[email protected]>
> To: <[email protected]>
> Sent: Sunday, April 26, 2009 12:06 AM
> Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid
> Engine
>
>
>
>  the parameter you specify for fs.default name should be of the form
>> hdfs://host:port and the parameter you specify for the mapred.job.tracker
>> MUST be host:port. I haven't looked at 18.3,  but it appears that the
>> :port
>> is mandatory.
>>
>> In your case, the piece of code parsing the fs.default.name variable is
>> not
>> able to tokenize it into protocol host and port correctly
>>
>> recap:
>> fs.default.name hdfs://namenodeHost:port
>> mapred.job.tracker jobtrackerHost:port
>> sepecify all the parts above and try again.
>>
>> Can you please point me at information on using the sun grid, I want to
>> include a paragraph or two about it in my book.
>>
>> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
>> [email protected]> wrote:
>>
>>  Hi, there,
>>>
>>> My hadoop system (version: 0.18.3) works well under standalone and
>>> pseudo-distributed operation. But if I try to run hadoop in
>>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in
>>> fact,
>>> the jobTracker and TaskzTracker can be started, but the namenode and
>>> secondary namenode cannot be started. Could anyone help me with it?
>>>
>>> My SGE scripts looks like:
>>>
>>> #!/bin/bash
>>> #$ -cwd
>>> #$ -S /bin/bash
>>> #$ -l long=TRUE
>>> #$ -v JAVA_HOME=/usr/java/latest
>>> #$ -v HADOOP_HOME=*********
>>> #$ -pe hadoop 6
>>> PATH="$HADOOP_HOME/bin:$PATH"
>>> hadoop fs -put ********
>>> hadoop jar *****
>>> hadoop fs -get *********
>>>
>>> Then the output looks like:
>>> Exception in thread "main" java.lang.NumberFormatException: For input
>>> string: ""
>>>      at
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>>> java:48)
>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>      at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>      at
>>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>>> eSystem.java:66)
>>>      at
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>>> )
>>>      at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>>      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>>      at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>>      at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>      at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>>
>>> And the log of NameNode looks like
>>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting NameNode
>>> STARTUP_MSG:   host = ************
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.18.3
>>> ************************************************************/
>>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.NumberFormatException: For i
>>> nput string: ""
>>>      at
>>>
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>      at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>      at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>>      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>>      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>>
>>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode:
>>> SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>>
>>> Best,
>>> Jasmine
>>>
>>>
>>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Reply via email to