Re: Zeppelin fails to submit Spark job in Yarn Cluster mode - a Bug ?

Sourav Mazumder Sun, 11 Oct 2015 11:41:42 -0700

Hi Moon,

Yes I have checked the same.


I have put some debug statement in the interpreter.sh to see what exactly
is getting passed when I set the SPARK_HOME in zeppelin-env.sh.

The debug statement does show that it is using the spark-submit utility
from the bin folder of the SPARK_HOME which I have set in zeppelin-env.sh.

Regards,
Sourav

On Sun, Oct 11, 2015 at 2:55 AM, moon soo Lee <[email protected]> wrote:

> Could you make sure your zeppelin-env.sh have SPARK_HOME exported?
>
> Zeppelin(0.6.0-SNAPSHOT) uses spark-submit command when SPARK_HOME is
> defined, but your error shows that "please use spark-submit".
>
> Thanks,
> moon
> On 2015년 10월 8일 (목) at 오후 9:14 Sourav Mazumder <
> [email protected]> wrote:
>
>> Hi Deepak/Moon,
>>
>> After seeing the stack trace of the error and the code
>> org.apache.zeppelin.spark.SparkInterpreter.java I think this is surely a
>> bug in Spark Interpreter code.
>>
>> The SparkInterpreter code is always calling the constructor of
>> org.apache.spark.SparkContext to create a new Spark Context whenever the
>> SparkInterpreter class is loaded by
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer. And hence
>> this error.
>>
>> I'm not sure whether the check for yarn-cluster is newly added in
>> SparkContext.
>>
>> Attaching here the complete stack trace for your ease of reference.
>>
>> Regards,
>> Sourav
>>
>> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't
>> running on a cluster. Deployment to YARN is not supported directly by
>> SparkContext. Please use spark-submit. at
>> org.apache.spark.SparkContext.<init>(SparkContext.scala:378) at
>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:149)
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
>> at
>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at
>> java.util.concurrent.FutureTask.run(Unknown Source) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
>> Source) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
>> Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source) at java.lang.Thread.run(Unknown Source)
>>
>> On Mon, Oct 5, 2015 at 12:57 PM, Sourav Mazumder <
>> [email protected]> wrote:
>>
>>> I could execute following without any issue.
>>>
>>> spark-submit --class org.apache.spark.examples.SparkPi --master
>>> yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m
>>> --executor-cores 1 lib/spark-examples.jar 10
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Mon, Oct 5, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]>
>>> wrote:
>>>
>>>> did you try a test job with yarn-cluster (outside zeppelin) ?
>>>>
>>>> On Mon, Oct 5, 2015 at 11:48 AM, Sourav Mazumder <
>>>> [email protected]> wrote:
>>>>
>>>>> Yes I have them setup appropriately.
>>>>>
>>>>> Where I am lost is I can see that interpreter is running spark-submit
>>>>> but at some point of time it is switching to creating a spark context.
>>>>>
>>>>> May be, as you rightly mentioned, because of some permission issue it
>>>>> is not able to run driver on yarn cluster. But what is that issue/required
>>>>> configuration I'm not able to figure out.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>>> On Mon, Oct 5, 2015 at 11:38 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Do you have these settings configured in zeppelin-env.sh
>>>>>>
>>>>>> export JAVA_HOME=/usr/src/jdk1.7.0_79/
>>>>>>
>>>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>>>
>>>>>> Most likely you have this as your able to run with yarn-client.
>>>>>>
>>>>>>
>>>>>> Looks like the issue is to not be able to run the driver program on
>>>>>> cluster.
>>>>>>
>>>>>> On Mon, Oct 5, 2015 at 11:13 AM, Sourav Mazumder <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Yes. Spark is installed in the machine where zeppelin is running.
>>>>>>>
>>>>>>> The location of spark.yarn.jar is very similar to what you have. I'm
>>>>>>> using IOP as distribution and it is the directory naming convention
>>>>>>> specific to IOP which is different form hdp.
>>>>>>>
>>>>>>> And yes the setup works perfectly fine when I use master as
>>>>>>> yarn-client and same setup for SPARK_HOME, HADOOP_CONF_DIR and
>>>>>>> HADOOP_CLIENT>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sourav
>>>>>>>
>>>>>>> On Mon, Oct 5, 2015 at 10:25 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is spark installed on your zeppelin machine ?
>>>>>>>>
>>>>>>>> I would to try these
>>>>>>>>
>>>>>>>> master yarn-client
>>>>>>>> spark.home === SPARK INSTALLATION HOME directory on your zeppelin
>>>>>>>> server.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at  spark.yarn.jar , i see spark is installed at
>>>>>>>> /usr/iop/current/spark-thriftserver/  . But why is it thirftserver
>>>>>>>> (i do not know what is it).
>>>>>>>>
>>>>>>>> I have spark installed (unzip) on zeppelin machine at 
>>>>>>>> /usr/hdp/2.3.1.0-2574/spark/spark/
>>>>>>>>  (can be any location) and have spark.yarn.jar to
>>>>>>>> /usr/hdp/2.3.1.0-2574/spark/spark/lib/spark-assembly-1.4.1-hadoop2.6.0.jar.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 5, 2015 at 10:20 AM, Sourav Mazumder <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Deepu,
>>>>>>>>>
>>>>>>>>> Here u go.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Sourav
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Properties* name value args master yarn-cluster spark.app.name 
>>>>>>>>> Zeppelin
>>>>>>>>> spark.cores.max spark.executor.memory 512m spark.home spark.yarn.jar
>>>>>>>>> /usr/iop/current/spark-thriftserver/lib/spark-assembly.jar 
>>>>>>>>> zeppelin.dep.localrepo
>>>>>>>>> local-repo zeppelin.pyspark.python python zeppelin.spark.concurrentSQL
>>>>>>>>> false zeppelin.spark.maxResult 1000 zeppelin.spark.useHiveContext true
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 5, 2015 at 10:05 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Can you share screen shot of your spark interpreter on zeppelin
>>>>>>>>>> web interface.
>>>>>>>>>>
>>>>>>>>>> I have exact same deployment structure and it runs fine with
>>>>>>>>>> right set of configurations.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 5, 2015 at 7:56 AM, Sourav Mazumder <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>
>>>>>>>>>>> I'm using 0.6 SNAPSHOT which I built from latest git hub.
>>>>>>>>>>>
>>>>>>>>>>> I tried setting SPARK_HOME in zeppelin-env.sh. Also I could see
>>>>>>>>>>> that the control goes to the appropriate IF-ELSE block in 
>>>>>>>>>>> interpreter.sh by
>>>>>>>>>>> putting some debug statement.
>>>>>>>>>>>
>>>>>>>>>>> But I get the same error as follows -
>>>>>>>>>>>
>>>>>>>>>>> org.apache.spark.SparkException: Detected yarn-cluster mode, but
>>>>>>>>>>> isn't running on a cluster. Deployment to YARN is not supported 
>>>>>>>>>>> directly by
>>>>>>>>>>> SparkContext. Please use spark-submit. at
>>>>>>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:378) at
>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:149)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
>>>>>>>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
>>>>>>>>>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
>>>>>>>>>>> at 
>>>>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>>>>>>>>> at
>>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>>>>>>> at
>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>>>>>>> at
>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>>>
>>>>>>>>>>> Let me know if you need any other details to figure out what is
>>>>>>>>>>> going on.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Sourav
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 30, 2015 at 1:53 AM, moon soo Lee <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Which version of Zeppelin are you using?
>>>>>>>>>>>>
>>>>>>>>>>>> Master branch uses spark-submit command, when SPARK_HOME is
>>>>>>>>>>>> defined in conf/zeppelin-env.sh
>>>>>>>>>>>>
>>>>>>>>>>>> If you're not on master branch, recommend try it with
>>>>>>>>>>>> SPARK_HOME defined.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>> moon
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 23, 2015 at 10:21 PM Sourav Mazumder <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I try to run Spark Interpreter in Yarn Cluster mode from
>>>>>>>>>>>>> a remote machine I always get the error saying try spark-submit 
>>>>>>>>>>>>> than using
>>>>>>>>>>>>> spark context.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mu Zeppelin process runs in a separate machine remote to the
>>>>>>>>>>>>> YARN cluster.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any idea why is this error ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Deepak
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Deepak
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Deepak
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>

Re: Zeppelin fails to submit Spark job in Yarn Cluster mode - a Bug ?

Reply via email to