Hello Michael,

I have a quick question for you. Can you clarify the statement " build fat
JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and
everything needed to run a Job".  Can you give an example.

I am using sbt assembly as well to create a fat jar, and supplying the
spark and hadoop locations in the class path. Inside the main() function
where spark context is created, I use SparkContext.jarOfClass(this).toList
add the fat jar to my spark context. However, I seem to be running into
issues with this approach. I was wondering if you had any inputs Michael.

Thanks,
Shivani


On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote:

> We use maven for building our code and then invoke spark-submit through
> the exec plugin, passing in our parameters. Works well for us.
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> wrote:
>
>> P.S. Last but not least we use sbt-assembly to build fat JAR's and build
>> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
>> to run a Job.  These are automatically built from source by our Jenkins and
>> stored in HDFS.  Our Chronos/Marathon jobs fetch the latest release TAR.GZ
>> direct from HDFS, unpack it and launch the appropriate script.
>>
>> Makes for a much cleaner development / testing / deployment to package
>> everything required in one go instead of relying on cluster specific
>> classpath additions or any add-jars functionality.
>>
>>
>> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote:
>>
>>> When you start seriously using Spark in production there are basically
>>> two things everyone eventually needs:
>>>
>>>    1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
>>>    2. Always-On Jobs - that require monitoring, restarting etc.
>>>
>>> There are lots of ways to implement these requirements, everything from
>>> crontab through to workflow managers like Oozie.
>>>
>>> We opted for the following stack:
>>>
>>>    - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)
>>>
>>>
>>>    - Marathon <https://github.com/mesosphere/marathon> - init/control
>>>    system for starting, stopping, and maintaining always-on applications.
>>>
>>>
>>>    - Chronos <http://airbnb.github.io/chronos/> - general-purpose
>>>    scheduler for Mesos, supports job dependency graphs.
>>>
>>>
>>>    - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> -
>>>    primarily for it's ability to reuse shared contexts with multiple jobs
>>>
>>> The majority of our jobs are periodic (batch) jobs run through
>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run
>>> through spark-submit).
>>>
>>> We always use "client mode" with spark-submit because the Mesos cluster
>>> has direct connectivity to the Spark cluster and it means all the Spark
>>> stdout/stderr is externalised into Mesos logs which helps diagnosing
>>> problems.
>>>
>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
>>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
>>> can be up and running in literally minutes.  The Web UI's to both make it
>>> easy to get started without talking to REST API's etc.
>>>
>>> Best,
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote:
>>>
>>>> I use SBT, create an assembly, and then add the assembly jars when I
>>>> create my spark context. The main executor I run with something like "java
>>>> -cp ... MyDriver".
>>>>
>>>> That said - as of spark 1.0 the preferred way to run spark applications
>>>> is via spark-submit -
>>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>>
>>>>
>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote:
>>>>
>>>>> I want to ask this, not because I can't read endless documentation and
>>>>> several tutorials, but because there seems to be many ways of doing
>>>>> things
>>>>> and I keep having issues. How do you run /your /spark app?
>>>>>
>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I
>>>>> had
>>>>> to get Spark and Shark working and ended upgrading everything and
>>>>> dropped
>>>>> CDH support. Anyways, this is what I used with master=yarn-client and
>>>>> app_jar being Scala code compiled with Maven.
>>>>>
>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>>>>> $CLASSNAME
>>>>> $ARGS
>>>>>
>>>>> Do you use this? or something else? I could never figure out this
>>>>> method.
>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>>>>
>>>>> For example:
>>>>> bin/spark-class jar
>>>>>
>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>>>>> pi 10 10
>>>>>
>>>>> Do you use SBT or Maven to compile? or something else?
>>>>>
>>>>>
>>>>> ** It seams that I can't get subscribed to the mailing list and I
>>>>> tried both
>>>>> my work email and personal.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to