Thank you for verifying. Glad to help :) On Tue, Feb 23, 2016 at 3:51 AM Ian Maloney <rachmaninovquar...@gmail.com> wrote:
> Hi Mina, > > I added your changes and they got the pyspark interpreter working! Thanks > so much for your help! > > Ian > > > On Sunday, February 21, 2016, mina lee <mina...@apache.org> wrote: > >> Hi Ian, sorry for late reply. >> I was able to reproduce the same error with spark 1.4.1 & hadoop >> 2.6.0. Turned out it was bug from Zeppelin. >> After some search, I realized that `spark.yarn.isPython` property is >> introduced since 1.5.0. I just made a PR( >> https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It >> will be really appreciated if you can try it and see if it works. Thank you >> for reporting bug! >> >> Regard, >> Mina >> >> On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney < >> rachmaninovquar...@gmail.com> wrote: >> >>> Hi Mina, >>> >>> Thanks for the response. I recloned the master from github and built >>> using: >>> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark >>> >>> I did that locally then scped to a node in a cluster running HDP 2.3 >>> (spark 1.4.1 & hadoop 2.7.1). >>> >>> I added the two config files from below and started the Zeppelin daemon. >>> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be >>> "true". >>> >>> The pyspark interpreter gives the same error as before. Are there any >>> other configs I should check? I'm beginning to wonder if it's related to >>> something in Hortonworks' distribution of spark or yarn. >>> >>> >>> >>> On Tuesday, February 16, 2016, mina lee <mina...@apache.org> wrote: >>> >>>> Hi Ian, >>>> >>>> The log stack looks quite similar with >>>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed >>>> since v0.5.6 >>>> This happens when pyspark.zip and py4j-*.zip are not distributed to >>>> yarn worker nodes. >>>> >>>> If you are building from source code can you please double check that >>>> you pulled the latest master? >>>> And also to be sure can you confirm that if you can see >>>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI) >>>> > Environment > Spark Properties? >>>> >>>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney < >>>> rachmaninovquar...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I've been trying unsuccessfully to configure the pyspark interpreter >>>>> on Zeppelin. I can use pyspark from the CLI and can use the Spark >>>>> interpreter from Zeppelin without issue. Here are the lines which aren't >>>>> commented out in my zeppelin-env.sh file: >>>>> >>>>> export MASTER=yarn-client >>>>> >>>>> export ZEPPELIN_PORT=8090 >>>>> >>>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 >>>>> -Dspark.yarn.queue=default" >>>>> >>>>> export SPARK_HOME=/usr/hdp/current/spark-client/ >>>>> >>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>>> >>>>> export PYSPARK_PYTHON=/usr/bin/python >>>>> >>>>> export >>>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH >>>>> >>>>> Running a simple pyspark script in the interpreter gives this error: >>>>> >>>>> 1. Py4JJavaError: An error occurred while calling >>>>> z:org.apache.spark.api.python.PythonRDD.runJob. >>>>> 2. : org.apache.spark.SparkException: Job aborted due to stage >>>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost >>>>> task >>>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): >>>>> org.apache.spark.SparkException: >>>>> 3. Error from python worker: >>>>> 4. /usr/bin/python: No module named pyspark >>>>> 5. PYTHONPATH was: >>>>> 6. >>>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar >>>>> >>>>> More details can be found here: >>>>> >>>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html >>>>> >>>>> Thanks, >>>>> >>>>> Ian >>>>> >>>>> >>>> >>