Hi, I am running spark 1.1.0 on yarn. I have custom set of modules installed under same location on each executor node and wondering how can I pass the executors the PYTHONPATH so that they can use the modules. I've tried this: spark-env.sh:export PYTHONPATH=/tmp/test/
spark-defaults.conf:spark.executorEnv.PYTHONPATH=/tmp/test/ /tmp/test/pkg:__init__.pymod.py: def test(x): return x from the pyspark shell I can import the module pkg.mod without any issues: $$$ import pkg.mod$$$ print pkg.mod.test(1)1 also the path is correctly set: $$$ print os.environ['PYTHONPATH']/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip:/usr/lib/spark/python/:/tmp/test/ $$$ print sys.path['', '/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip', '/usr/lib/spark/python', '/tmp/test/', ... ] it even is seen by the executors: $$$ sc.parallelize(range(4)).map(lambda x: os.environ['PYTHONPATH']).collect()['/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/', '/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/', '/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/', '/u02/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/'] yet it fails when actually using the module on the executor:$$$ sc.parallelize(range(4)).map(pkg.mod.test).collect()...ImportError: No module named mod... any idea how to achieve this? don't want to use the sc.addPyFile as this is big packages and they are installed everywhere anyway... thank you,Antony.