Re: spark-ec2 script problems

Akhil Das Thu, 05 Mar 2015 22:28:29 -0800

It works pretty fine for me with the script comes with 1.2.0 release.

Here's a few things which you can try:


- Add your s3 credentials to the core-site.xml

<property>  <name>fs.s3.awsAccessKeyId</name>
<value>ID</value></property><property>
<name>fs.s3.awsSecretAccessKey</name>
<value>SECRET</value></property>

- Do a jps and see all services are up and running (Namenode,
SecondaryNamenode, Datanodes etc.)

I think the default hdfs port that comes with spark-ec2 is 9000. You can
check that in your core-site.xml file



Thanks
Best Regards

On Fri, Mar 6, 2015 at 7:14 AM, roni <roni.epi...@gmail.com> wrote:

> Hi ,
>  I used spark-ec2 script to create ec2 cluster.
>
>  Now I am trying copy data from s3 into hdfs.
> I am doing this
> *root@ip-172-31-21-160 ephemeral-hdfs]$ bin/hadoop distcp
> s3://<xxx>/home/mydata/small.sam
> hdfs://ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9010/data1
> <http://ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9010/data1>*
>
> and I get following error -
>
> 2015-03-06 01:39:27,299 INFO  tools.DistCp (DistCp.java:run(109)) - Input
> Options: DistCpOptions{atomicCommit=false, syncFolder=false,
> deleteMissing=false, ignoreFailures=false, maxMaps=20,
> sslConfigurationFile='null', copyStrategy='uniformsize',
> sourceFileListing=null, sourcePaths=[s3://<xxX>/home/mydata/small.sam],
> targetPath=hdfs://
> ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9010/data1}
> 2015-03-06 01:39:27,585 INFO  mapreduce.Cluster
> (Cluster.java:initialize(114)) - Failed to use
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid
> "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "
> ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9001"
> 2015-03-06 01:39:27,585 ERROR tools.DistCp (DistCp.java:run(126)) -
> Exception encountered
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
>     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>     at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>     at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>
> I tried doing start-all.sh , start-dfs.sh  and start-yarn.sh
>
> what should I do ?
> Thanks
> -roni
>
>

Re: spark-ec2 script problems

Reply via email to