Hi Chester, Thank you very much, it is clear now - just two different way to support spark on acluster
Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 3:22 PM, Chester @work <[email protected]> wrote: > In Yarn cluster mode, you can either have spark on all the cluster nodes > or supply the spark jar yourself. In the 2nd case, you don't need install > spark on cluster at all. As you supply the spark assembly as we as your app > jar together. > > I hope this make it clear > > Chester > > Sent from my iPhone > > On Jul 7, 2014, at 5:05 AM, Konstantin Kudryavtsev < > [email protected]> wrote: > > thank you Krishna! > > Could you please explain why do I need install spark on each node if > Spark official site said: If you have a Hadoop 2 cluster, you can run > Spark without any installation needed > > I have HDP 2 (YARN) and that's why I hope I don't need to install spark on > each node > > Thank you, > Konstantin Kudryavtsev > > > On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar <[email protected]> > wrote: > >> Konstantin, >> >> 1. You need to install the hadoop rpms on all nodes. If it is Hadoop >> 2, the nodes would have hdfs & YARN. >> 2. Then you need to install Spark on all nodes. I haven't had >> experience with HDP, but the tech preview might have installed Spark as >> well. >> 3. In the end, one should have hdfs,yarn & spark installed on all the >> nodes. >> 4. After installations, check the web console to make sure hdfs, yarn >> & spark are running. >> 5. Then you are ready to start experimenting/developing spark >> applications. >> >> HTH. >> Cheers >> <k/> >> >> >> On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev < >> [email protected]> wrote: >> >>> guys, I'm not talking about running spark on VM, I don have problem with >>> it. >>> >>> I confused in the next: >>> 1) Hortonworks describe installation process as RPMs on each node >>> 2) spark home page said that everything I need is YARN >>> >>> And I'm in stucj with understanding what I need to do to run spark on >>> yarn (do I need RPMs installations or only build spark on edge node?) >>> >>> >>> Thank you, >>> Konstantin Kudryavtsev >>> >>> >>> On Mon, Jul 7, 2014 at 4:34 AM, Robert James <[email protected]> >>> wrote: >>> >>>> I can say from my experience that getting Spark to work with Hadoop 2 >>>> is not for the beginner; after solving one problem after another >>>> (dependencies, scripts, etc.), I went back to Hadoop 1. >>>> >>>> Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure >>>> why, but, given so, Hadoop 2 has too many bumps >>>> >>>> On 7/6/14, Marco Shaw <[email protected]> wrote: >>>> > That is confusing based on the context you provided. >>>> > >>>> > This might take more time than I can spare to try to understand. >>>> > >>>> > For sure, you need to add Spark to run it in/on the HDP 2.1 express >>>> VM. >>>> > >>>> > Cloudera's CDH 5 express VM includes Spark, but the service isn't >>>> running by >>>> > default. >>>> > >>>> > I can't remember for MapR... >>>> > >>>> > Marco >>>> > >>>> >> On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev >>>> >> <[email protected]> wrote: >>>> >> >>>> >> Marco, >>>> >> >>>> >> Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that >>>> you >>>> >> can try >>>> >> from >>>> >> >>>> http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf >>>> >> HDP 2.1 means YARN, at the same time they propose ti install rpm >>>> >> >>>> >> On other hand, http://spark.apache.org/ said " >>>> >> Integrated with Hadoop >>>> >> Spark can run on Hadoop 2's YARN cluster manager, and can read any >>>> >> existing Hadoop data. >>>> >> >>>> >> If you have a Hadoop 2 cluster, you can run Spark without any >>>> installation >>>> >> needed. " >>>> >> >>>> >> And this is confusing for me... do I need rpm installation on not?... >>>> >> >>>> >> >>>> >> Thank you, >>>> >> Konstantin Kudryavtsev >>>> >> >>>> >> >>>> >>> On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw <[email protected]> >>>> >>> wrote: >>>> >>> Can you provide links to the sections that are confusing? >>>> >>> >>>> >>> My understanding, the HDP1 binaries do not need YARN, while the HDP2 >>>> >>> binaries do. >>>> >>> >>>> >>> Now, you can also install Hortonworks Spark RPM... >>>> >>> >>>> >>> For production, in my opinion, RPMs are better for manageability. >>>> >>> >>>> >>>> On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev >>>> >>>> <[email protected]> wrote: >>>> >>>> >>>> >>>> Hello, thanks for your message... I'm confused, Hortonworhs suggest >>>> >>>> install spark rpm on each node, but on Spark main page said that >>>> yarn >>>> >>>> enough and I don't need to install it... What the difference? >>>> >>>> >>>> >>>> sent from my HTC >>>> >>>> >>>> >>>>> On Jul 6, 2014 8:34 PM, "vs" <[email protected]> wrote: >>>> >>>>> Konstantin, >>>> >>>>> >>>> >>>>> HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you >>>> can >>>> >>>>> try >>>> >>>>> from >>>> >>>>> >>>> http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf >>>> >>>>> >>>> >>>>> Let me know if you see issues with the tech preview. >>>> >>>>> >>>> >>>>> "spark PI example on HDP 2.0 >>>> >>>>> >>>> >>>>> I downloaded spark 1.0 pre-build from >>>> >>>>> http://spark.apache.org/downloads.html >>>> >>>>> (for HDP2) >>>> >>>>> The run example from spark web-site: >>>> >>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi >>>> >>>>> --master >>>> >>>>> yarn-cluster --num-executors 3 --driver-memory 2g >>>> --executor-memory 2g >>>> >>>>> --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2 >>>> >>>>> >>>> >>>>> I got error: >>>> >>>>> Application application_1404470405736_0044 failed 3 times due to >>>> AM >>>> >>>>> Container for appattempt_1404470405736_0044_000003 exited with >>>> >>>>> exitCode: 1 >>>> >>>>> due to: Exception from container-launch: >>>> >>>>> org.apache.hadoop.util.Shell$ExitCodeException: >>>> >>>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) >>>> >>>>> at org.apache.hadoop.util.Shell.run(Shell.java:379) >>>> >>>>> at >>>> >>>>> >>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) >>>> >>>>> at >>>> >>>>> >>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) >>>> >>>>> at >>>> >>>>> >>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) >>>> >>>>> at >>>> >>>>> >>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) >>>> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> >>>>> at >>>> >>>>> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> >>>>> at >>>> >>>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> >>>>> at java.lang.Thread.run(Thread.java:744) >>>> >>>>> .Failing this attempt.. Failing the application. >>>> >>>>> >>>> >>>>> Unknown/unsupported param List(--executor-memory, 2048, >>>> >>>>> --executor-cores, 1, >>>> >>>>> --num-executors, 3) >>>> >>>>> Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] >>>> >>>>> Options: >>>> >>>>> --jar JAR_PATH Path to your application's JAR file >>>> (required) >>>> >>>>> --class CLASS_NAME Name of your application's main class >>>> >>>>> (required) >>>> >>>>> ...bla-bla-bla >>>> >>>>> " >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> -- >>>> >>>>> View this message in context: >>>> >>>>> >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html >>>> >>>>> Sent from the Apache Spark User List mailing list archive at >>>> >>>>> Nabble.com. >>>> >> >>>> > >>>> >>> >>> >> >
