I think you have been through enough :). Basically you have to download spark-ec2 scripts & run them. It'll just need your amazon secret key & access key, start your cluster, install everything, create security groups & give you the url, just login & go ahead...
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Mon, Mar 3, 2014 at 11:00 AM, Bin Wang <binwang...@gmail.com> wrote: > Hi there, > > I have a CDH cluster set up, and I tried using the Spark parcel come with > Cloudera Manager, but it turned out they even don't have the run-example > shell command in the bin folder. Then I removed it from the cluster and > cloned the incubator-spark into the name node of my cluster, and built from > source there successfully with everything as default. > > I ran a few examples and everything seems work fine in the local mode. > Then I am thinking about scale it to my cluster, which is what the > "DISTRIBUTE + ACTIVATE" command does in Cloudera Manager. I want to add all > the datanodes to the slaves and think I should run Spark in the standalone > mode. > > Say I am trying to set up Spark in the standalone mode following this > instruction: > https://spark.incubator.apache.org/docs/latest/spark-standalone.html > However, it says "Once started, the master will print out a > spark://HOST:PORT URL for itself, which you can use to connect workers to > it, or pass as the “master” argument to SparkContext. You can also find > this URL on the master’s web UI, which is http://localhost:8080 by > default." > > After I started the master, there is no URL printed on the screen and > neither the web UI is running. > Here is the output: > [root@box incubator-spark]# ./sbin/start-master.sh > starting org.apache.spark.deploy.master.Master, logging to > /root/bwang_spark_new/incubator-spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-box.out > > First Question: am I even in the ballpark to run Spark in standalone mode > if I try to fully utilize my cluster? I saw there are four ways to launch > Spark on a cluster, AWS-EC2, Spark in standalone, Apache Meso, Hadoop > Yarn... which I guess standalone mode is the way to go? > > Second Question: how to get the Spark URL of the cluster, why the output > is not like what the instruction says? > > Best regards, > > Bin >