Hello Michael, I have a quick question for you. Can you clarify the statement " build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job". Can you give an example.
I am using sbt assembly as well to create a fat jar, and supplying the spark and hadoop locations in the class path. Inside the main() function where spark context is created, I use SparkContext.jarOfClass(this).toList add the fat jar to my spark context. However, I seem to be running into issues with this approach. I was wondering if you had any inputs Michael. Thanks, Shivani On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > We use maven for building our code and then invoke spark-submit through > the exec plugin, passing in our parameters. Works well for us. > > Best Regards, > Sonal > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> wrote: > >> P.S. Last but not least we use sbt-assembly to build fat JAR's and build >> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed >> to run a Job. These are automatically built from source by our Jenkins and >> stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ >> direct from HDFS, unpack it and launch the appropriate script. >> >> Makes for a much cleaner development / testing / deployment to package >> everything required in one go instead of relying on cluster specific >> classpath additions or any add-jars functionality. >> >> >> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote: >> >>> When you start seriously using Spark in production there are basically >>> two things everyone eventually needs: >>> >>> 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. >>> 2. Always-On Jobs - that require monitoring, restarting etc. >>> >>> There are lots of ways to implement these requirements, everything from >>> crontab through to workflow managers like Oozie. >>> >>> We opted for the following stack: >>> >>> - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution) >>> >>> >>> - Marathon <https://github.com/mesosphere/marathon> - init/control >>> system for starting, stopping, and maintaining always-on applications. >>> >>> >>> - Chronos <http://airbnb.github.io/chronos/> - general-purpose >>> scheduler for Mesos, supports job dependency graphs. >>> >>> >>> - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> - >>> primarily for it's ability to reuse shared contexts with multiple jobs >>> >>> The majority of our jobs are periodic (batch) jobs run through >>> spark-sumit, and we have several always-on Spark Streaming jobs (also run >>> through spark-submit). >>> >>> We always use "client mode" with spark-submit because the Mesos cluster >>> has direct connectivity to the Spark cluster and it means all the Spark >>> stdout/stderr is externalised into Mesos logs which helps diagnosing >>> problems. >>> >>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run >>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you >>> can be up and running in literally minutes. The Web UI's to both make it >>> easy to get started without talking to REST API's etc. >>> >>> Best, >>> >>> Michael >>> >>> >>> >>> >>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote: >>> >>>> I use SBT, create an assembly, and then add the assembly jars when I >>>> create my spark context. The main executor I run with something like "java >>>> -cp ... MyDriver". >>>> >>>> That said - as of spark 1.0 the preferred way to run spark applications >>>> is via spark-submit - >>>> http://spark.apache.org/docs/latest/submitting-applications.html >>>> >>>> >>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote: >>>> >>>>> I want to ask this, not because I can't read endless documentation and >>>>> several tutorials, but because there seems to be many ways of doing >>>>> things >>>>> and I keep having issues. How do you run /your /spark app? >>>>> >>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I >>>>> had >>>>> to get Spark and Shark working and ended upgrading everything and >>>>> dropped >>>>> CDH support. Anyways, this is what I used with master=yarn-client and >>>>> app_jar being Scala code compiled with Maven. >>>>> >>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER >>>>> $CLASSNAME >>>>> $ARGS >>>>> >>>>> Do you use this? or something else? I could never figure out this >>>>> method. >>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS >>>>> >>>>> For example: >>>>> bin/spark-class jar >>>>> >>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >>>>> pi 10 10 >>>>> >>>>> Do you use SBT or Maven to compile? or something else? >>>>> >>>>> >>>>> ** It seams that I can't get subscribed to the mailing list and I >>>>> tried both >>>>> my work email and personal. >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>> >>>> >>> >> >