Sami Jaktholm created ZEPPELIN-1265: ---------------------------------------
Summary: Value of zeppelin.pyspark.python not reflected the python version Spark executors use on YARN Key: ZEPPELIN-1265 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1265 Project: Zeppelin Issue Type: Bug Affects Versions: 0.5.6 Reporter: Sami Jaktholm STR: 0. Have both python2 and python3 installed 1. Set {{zeppelin.pyspark.python}} to {{python3}} (or where your Python 3 is installed) 2. Run some pyspark code that involves executing tasks on executor nodes What happens: bq. Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions What should happen: The code runs without exceptions as the value of {{zeppelin.pyspark.python}} is correctly propagated to the Spark executors. IMO the correct behavior is to use the the value {{zeppelin.pyspark.python}} as the Python interpreter with Spark and set {{PYSPARK_PYTHON}} to that same value so that Spark can pick it up and ship it to executors. The problem here is that when {{zeppelin.pyspark.python}} is set to use Python 3, Zeppelin starts the Spark master process with python3. However, this configuration is not reflected on the executors and they use what they can find in {{PYSPARK_PYTHON}} envvar, which defaults to Python 2. So changing python version also requires setting {{PYSPARK_PYTHON}} to correct value which it not easy thing to do on the fly (I guess you need to change a config file somewhere and restart Zeppelin to achieve that). This might only be an issue when running Spark on a YARN cluster with multiple machines (as in the selection of python version is not propagated to the executor machines) since I haven't tested this in a single machine scenario. Also, I haven't been able to test this in Zeppelin 0.6.0 yet so this could already be fixed but I didn't find any similar tickets that were resolved after Zeppelin 0.5.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)