Hi Ethan, These behavior are not expected. Maybe you are hitting this issue which is fixed in 0.8.2 https://jira.apache.org/jira/browse/ZEPPELIN-3986
Y. Ethan Guo <guoyi...@uber.com> 于2019年4月8日周一 下午4:26写道: > Hi Jeff, Dave, > > Thanks for the suggestion. I was able to successfully run the Spark > interpreter in yarn cluster mode on anther machine running Zeppelin. The > previous problem could probably be due to network issues. > > I have two observations: > (1) I'm able to use "--jars" option in SPARK_SUBMIT_OPTIONS in the "spark" > interpreter with yarn cluster mode configured. I verify that the jars are > pushed to the driver and executors by successfully running a job using some > classes in the jars. However, if I create a new "spark_abc" interpreter > under the spark interpreter group, this new interpreter doesn't seem to > pick up SPARK_SUBMIT_OPTIONS and the jars option, leading to errors of not > being able to access packages/classes in the jars. > > (2) Once I restart the spark interpreters in the interpreter settings, the > corresponding Spark jobs in yarn cluster first transition from "RUNNING" > state to "ACCEPTED" state, and then end up in "FAILED" state. > > I'm wondering if the above behavior are expected and they are known to be > the limitations of the current 0.9.0-SNAPSHOT version. > > Thanks, > - Ethan > > On Sun, Apr 7, 2019 at 9:59 AM Dave Boyd <db...@incadencecorp.com> wrote: > >> From the connection refused message I wonder if it is an SSL error. I >> note none of the information for SSL (truststore, keystore, etc.) >> I would think the YARN cluster requires some form of authentication. >> On 4/7/19 9:27 AM, Jeff Zhang wrote: >> >> It looks like the interpreter process can not connect to zeppelin server >> process. I guess it is due to some network issue, can you check whether the >> node in yarn cluster can connect to the zeppelin server host ? >> >> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道: >> >>> Hi Jeff, >>> >>> Given this PR is merged, I'm trying to see if I can run yarn cluster >>> mode from master build. I built Zeppelin master from this commit: >>> >>> commit 3655c12b875884410224eca5d6155287d51916ac >>> Author: Jongyoul Lee <jongy...@gmail.com> >>> Date: Mon Apr 1 15:37:57 2019 +0900 >>> [MINOR] Refactor CronJob class (#3335) >>> >>> While I can successfully run Spark interpreter yarn client mode, I'm >>> having trouble making the yarn cluster mode working. Specifically, while >>> the interpreter job was accepted in yarn, the job failed after 1-2 minutes >>> because of this exception (see below). Do you have any idea why this >>> is happening? >>> >>> DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) - >>> Created SSL options for fs: SSLOptions{enabled=false, keyStore=None, >>> keyStorePassword=None, trustStore=None, trustStorePassword=None, >>> protocol=None, enabledAlgorithms=Set()} >>> INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) - >>> Starting the user application in a separate Thread >>> INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) - >>> Waiting for spark context initialization... >>> INFO [2019-04-07 06:57:00,403] ({Driver} >>> RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter >>> server on port 0, intpEventServerAddress: 172.17.0.1:45128 >>> ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) - >>> User class threw exception: >>> org.apache.thrift.transport.TTransportException: java.net.ConnectException: >>> Connection refused (Connection refused) >>> org.apache.thrift.transport.TTransportException: >>> java.net.ConnectException: Connection refused (Connection refused) >>> at org.apache.thrift.transport.TSocket.open(TSocket.java:226) >>> at >>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154) >>> at >>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139) >>> at >>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) >>> Caused by: java.net.ConnectException: Connection refused (Connection >>> refused) >>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>> at >>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) >>> at >>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) >>> at >>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) >>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>> at java.net.Socket.connect(Socket.java:589) >>> at org.apache.thrift.transport.TSocket.open(TSocket.java:221) >>> ... 8 more >>> >>> Thanks, >>> - Ethan >>> >>> On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> Here's the PR >>>> https://github.com/apache/zeppelin/pull/3308 >>>> >>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道: >>>> >>>>> Hi All, >>>>> >>>>> I'm trying to use the new feature of yarn cluster mode to run Spark >>>>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME, >>>>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in zeppelin-env.sh >>>>> so that the Spark interpreter can be started in the cluster. I used >>>>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I tried >>>>> to import a class from the jars in a Spark paragraph, the interpreter >>>>> complained that it cannot find the package and class ("<console>:23: >>>>> error: >>>>> object ... is not a member of package ..."). Looks like the jars are not >>>>> properly imported. >>>>> >>>>> I followed the instruction here >>>>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties> >>>>> to add the jars, but it seems that it's not working in the cluster mode. >>>>> And this issue seems to be related to this bug: >>>>> https://jira.apache.org/jira/browse/ZEPPELIN-3986. Is there any >>>>> update on fixing it? What is the right way to add local jars in yarn >>>>> cluster mode? Any help and update are much appreciated. >>>>> >>>>> >>>>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths >>>>> omitted): >>>>> >>>>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... --jars >>>>> ... --repositories >>>>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/ >>>>> " >>>>> >>>>> Thanks, >>>>> - Ethan >>>>> -- >>>>> Best, >>>>> - Ethan >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >> >> -- >> Best Regards >> >> Jeff Zhang >> >> -- >> ========= mailto:db...@incadencecorp.com <db...@incadencecorp.com> >> ============ >> David W. Boyd >> VP, Data Solutions >> 10432 Balls Ford, Suite 240 >> Manassas, VA 20109 >> office: +1-703-552-2862 >> cell: +1-703-402-7908 >> ============== http://www.incadencecorp.com/ ============ >> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture >> Chair ANSI/INCITS TC Big Data >> Co-chair NIST Big Data Public Working Group Reference Architecture >> First Robotic Mentor - FRC, FTC - www.iliterobotics.org >> Board Member- USSTEM Foundation - www.usstem.org >> >> The information contained in this message may be privileged >> and/or confidential and protected from disclosure. >> If the reader of this message is not the intended recipient >> or an employee or agent responsible for delivering this message >> to the intended recipient, you are hereby notified that any >> dissemination, distribution or copying of this communication >> is strictly prohibited. If you have received this communication >> in error, please notify the sender immediately by replying to >> this message and deleting the material from any computer. >> >> >> >> -- Best Regards Jeff Zhang