Hi Ethan,

These behavior are not expected. Maybe you are hitting this issue which is
fixed in 0.8.2
https://jira.apache.org/jira/browse/ZEPPELIN-3986


Y. Ethan Guo <guoyi...@uber.com> 于2019年4月8日周一 下午4:26写道:

> Hi Jeff, Dave,
>
> Thanks for the suggestion.  I was able to successfully run the Spark
> interpreter in yarn cluster mode on anther machine running Zeppelin.  The
> previous problem could probably be due to network issues.
>
> I have two observations:
> (1) I'm able to use "--jars" option in SPARK_SUBMIT_OPTIONS in the "spark"
> interpreter with yarn cluster mode configured.  I verify that the jars are
> pushed to the driver and executors by successfully running a job using some
> classes in the jars.  However, if I create a new "spark_abc" interpreter
> under the spark interpreter group, this new interpreter doesn't seem to
> pick up SPARK_SUBMIT_OPTIONS and the jars option, leading to errors of not
> being able to access packages/classes in the jars.
>
> (2) Once I restart the spark interpreters in the interpreter settings, the
> corresponding Spark jobs in yarn cluster first transition from "RUNNING"
> state to "ACCEPTED" state, and then end up in "FAILED" state.
>
> I'm wondering if the above behavior are expected and they are known to be
> the limitations of the current 0.9.0-SNAPSHOT version.
>
> Thanks,
> - Ethan
>
> On Sun, Apr 7, 2019 at 9:59 AM Dave Boyd <db...@incadencecorp.com> wrote:
>
>> From the connection refused message I wonder if it is an SSL error.  I
>> note none of the information for SSL (truststore, keystore, etc.)
>> I would think the YARN cluster requires some form of authentication.
>> On 4/7/19 9:27 AM, Jeff Zhang wrote:
>>
>> It looks like the interpreter process can not connect to zeppelin server
>> process. I guess it is due to some network issue, can you check whether the
>> node in yarn cluster can connect to the zeppelin server host ?
>>
>> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道:
>>
>>> Hi Jeff,
>>>
>>> Given this PR is merged, I'm trying to see if I can run yarn cluster
>>> mode from master build.  I built Zeppelin master from this commit:
>>>
>>> commit 3655c12b875884410224eca5d6155287d51916ac
>>> Author: Jongyoul Lee <jongy...@gmail.com>
>>> Date:   Mon Apr 1 15:37:57 2019 +0900
>>>     [MINOR] Refactor CronJob class (#3335)
>>>
>>> While I can successfully run Spark interpreter yarn client mode, I'm
>>> having trouble making the yarn cluster mode working.  Specifically, while
>>> the interpreter job was accepted in yarn, the job failed after 1-2 minutes
>>> because of this exception (see below).  Do you have any idea why this
>>> is happening?
>>>
>>> DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) -
>>> Created SSL options for fs: SSLOptions{enabled=false, keyStore=None,
>>> keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>> protocol=None, enabledAlgorithms=Set()}
>>>  INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) -
>>> Starting the user application in a separate Thread
>>>  INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) -
>>> Waiting for spark context initialization...
>>>  INFO [2019-04-07 06:57:00,403] ({Driver}
>>> RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter
>>> server on port 0, intpEventServerAddress: 172.17.0.1:45128
>>> ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) -
>>> User class threw exception:
>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>> Connection refused (Connection refused)
>>> org.apache.thrift.transport.TTransportException:
>>> java.net.ConnectException: Connection refused (Connection refused)
>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
>>> Caused by: java.net.ConnectException: Connection refused (Connection
>>> refused)
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at
>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>> at
>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>> at
>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> at java.net.Socket.connect(Socket.java:589)
>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
>>> ... 8 more
>>>
>>> Thanks,
>>> - Ethan
>>>
>>> On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>> Here's the PR
>>>> https://github.com/apache/zeppelin/pull/3308
>>>>
>>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I'm trying to use the new feature of yarn cluster mode to run Spark
>>>>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME,
>>>>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in zeppelin-env.sh
>>>>> so that the Spark interpreter can be started in the cluster. I used
>>>>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I tried
>>>>> to import a class from the jars in a Spark paragraph, the interpreter
>>>>> complained that it cannot find the package and class ("<console>:23: 
>>>>> error:
>>>>> object ... is not a member of package ..."). Looks like the jars are not
>>>>> properly imported.
>>>>>
>>>>> I followed the instruction here
>>>>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties>
>>>>> to add the jars, but it seems that it's not working in the cluster mode.
>>>>> And this issue seems to be related to this bug:
>>>>> https://jira.apache.org/jira/browse/ZEPPELIN-3986.  Is there any
>>>>> update on fixing it? What is the right way to add local jars in yarn
>>>>> cluster mode? Any help and update are much appreciated.
>>>>>
>>>>>
>>>>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths
>>>>> omitted):
>>>>>
>>>>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... --jars
>>>>> ... --repositories
>>>>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/
>>>>> "
>>>>>
>>>>> Thanks,
>>>>> - Ethan
>>>>> --
>>>>> Best,
>>>>> - Ethan
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>> --
>> ========= mailto:db...@incadencecorp.com <db...@incadencecorp.com> 
>> ============
>> David W. Boyd
>> VP,  Data Solutions
>> 10432 Balls Ford, Suite 240
>> Manassas, VA 20109
>> office:   +1-703-552-2862
>> cell:     +1-703-402-7908
>> ============== http://www.incadencecorp.com/ ============
>> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
>> Chair ANSI/INCITS TC Big Data
>> Co-chair NIST Big Data Public Working Group Reference Architecture
>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org
>> Board Member- USSTEM Foundation - www.usstem.org
>>
>> The information contained in this message may be privileged
>> and/or confidential and protected from disclosure.
>> If the reader of this message is not the intended recipient
>> or an employee or agent responsible for delivering this message
>> to the intended recipient, you are hereby notified that any
>> dissemination, distribution or copying of this communication
>> is strictly prohibited.  If you have received this communication
>> in error, please notify the sender immediately by replying to
>> this message and deleting the material from any computer.
>>
>>
>>
>>

-- 
Best Regards

Jeff Zhang

Reply via email to