Re: distcp on ec2 standalone spark cluster

2015-03-08 Thread Akhil Das
roblem > I am having problem where distcp with s3 URI says incorrect forlder path > and > s3n:// hangs. > stuck for 2 days :( > Thanks > -R > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-sp

Re: distcp on ec2 standalone spark cluster

2015-03-07 Thread roni
/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
well, this means you didn't start a compute cluster. Most likely because the wrong value of mapreduce.jobtracker.address cause the slave node cannot start the node manager. ( I am not familiar with the ec2 script, so I don't know whether the slave node has node manager installed or not.) Can yo

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
No tasktracker or nodemanager. This is what I see: On the master: org.apache.hadoop.yarn.server.resourcemanager.ResourceManager org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode org.apache.hadoop.hdfs.server.namenode.NameNode On the data node (slave): org.apache.hadoop.hdfs.server.datano

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
what did you see in the log? was there anything related to mapreduce? can you log into your hdfs (data) node, use jps to list all java process and confirm whether there is a tasktracker process (or nodemanager) running with datanode process -- Ye Xianjin Sent with Sparrow (http://www.sparrowma

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
Still no luck, even when running stop-all.sh followed by start-all.sh. On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas wrote: > Tomer, > > Did you try start-all.sh? It worked for me the last time I tried using > distcp, and it worked for this guy too. > > Nick > > > On Mon, Sep 8, 2014 at 3:28 A

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Nicholas Chammas
Tomer, Did you try start-all.sh? It worked for me the last time I tried using distcp, and it worked for this guy too . Nick ​ On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini wrote: > ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Frank Austin Nothaft
Tomer, To use distcp, you need to have a Hadoop compute cluster up. start-dfs just restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop MapReduce cluster that you will need for distcp. Regards,

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2; I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error when trying to run distcp: ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered java

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Josh Rosen
If I recall, you should be able to start Hadoop MapReduce using ~/ephemeral-hdfs/sbin/start-mapred.sh. On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini wrote: > Hi, > > I would like to copy log files from s3 to the cluster's > ephemeral-hdfs. I tried to use distcp, but I guess mapred is not > run

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Nicholas Chammas
I think you need to run start-all.sh or something similar on the EC2 cluster. MR is installed but is not running by default on EC2 clusters spun up by spark-ec2. ​ On Sun, Sep 7, 2014 at 12:33 PM, Tomer Benyamini wrote: > I've installed a spark standalone cluster on ec2 as defined here - > https

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
I've installed a spark standalone cluster on ec2 as defined here - https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if mr1/2 is part of this installation. On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin wrote: > Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduc

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Ye Xianjin
Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce cluster on your hdfs? And from the error message, it seems that you didn't specify your jobtracker address. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, September 7, 2014 at 9:42 PM, T

distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
Hi, I would like to copy log files from s3 to the cluster's ephemeral-hdfs. I tried to use distcp, but I guess mapred is not running on the cluster - I'm getting the exception below. Is there a way to activate it, or is there a spark alternative to distcp? Thanks, Tomer mapreduce.Cluster (Clust