Re: Spark on Yarn: Connecting to Existing Instance

John Omernik Wed, 09 Jul 2014 10:42:14 -0700

Thank you for the link.  In that link the following is written:

For those familiar with the Spark API, an application corresponds to an
instance of the SparkContext class. An application can be used for a single
batch job, an interactive session with multiple jobs spaced apart, or a
long-lived server continually satisfying requests


So, if I wanted to use "a long-lived server continually satisfying
requests" and then start a shell that connected to that context, how would
I do that in Yarn? That's the problem I am having right now, I just want
there to be that long lived service that I can utilize.

Thanks!


On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> To add to Ron's answer, this post explains what it means to run Spark
> against a YARN cluster, the difference between yarn-client and yarn-cluster
> mode, and the reason spark-shell only works in yarn-client mode.
>
> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>
> -Sandy
>
>
> On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez <zlgonza...@yahoo.com> wrote:
>
>> The idea behind YARN is that you can run different application types like
>> MapReduce, Storm and Spark.
>>
>> I would recommend that you build your spark jobs in the main method
>> without specifying how you deploy it. Then you can use spark-submit to tell
>> Spark how you would want to deploy to it using yarn-cluster as the master.
>> The key point here is that once you have YARN setup, the spark client
>> connects to it using the $HADOOP_CONF_DIR that contains the resource
>> manager address. In particular, this needs to be accessible from the
>> classpath of the submitter since it implicitly uses this when it
>> instantiates a YarnConfiguration instance. If you want more details, read
>> org.apache.spark.deploy.yarn.Client.scala.
>>
>> You should be able to download a standalone YARN cluster from any of the
>> Hadoop providers like Cloudera or Hortonworks. Once you have that, the
>> spark programming guide describes what I mention above in sufficient detail
>> for you to proceed.
>>
>> Thanks,
>> Ron
>>
>> Sent from my iPad
>>
>> > On Jul 9, 2014, at 8:31 AM, John Omernik <j...@omernik.com> wrote:
>> >
>> > I am trying to get my head around using Spark on Yarn from a
>> perspective of a cluster. I can start a Spark Shell no issues in Yarn.
>> Works easily.  This is done in yarn-client mode and it all works well.
>> >
>> > In multiple examples, I see instances where people have setup Spark
>> Clusters in Stand Alone mode, and then in the examples they "connect" to
>> this cluster in Stand Alone mode. This is done often times using the
>> spark:// string for connection.  Cool. s
>> > But what I don't understand is how do I setup a Yarn instance that I
>> can "connect" to? I.e. I tried running Spark Shell in yarn-cluster mode and
>> it gave me an error, telling me to use yarn-client.  I see information on
>> using spark-class or spark-submit.  But what I'd really like is a instance
>> I can connect a spark-shell too, and have the instance stay up. I'd like to
>> be able run other things on that instance etc. Is that possible with Yarn?
>> I know there may be long running job challenges with Yarn, but I am just
>> testing, I am just curious if I am looking at something completely bonkers
>> here, or just missing something simple.
>> >
>> > Thanks!
>> >
>> >
>>
>
>

Re: Spark on Yarn: Connecting to Existing Instance

Reply via email to