Hi,
I am running spark 1.1.0 on yarn. I have custom set of modules installed under 
same location on each executor node and wondering how can I pass the executors 
the PYTHONPATH so that they can use the modules.
I've tried this:
spark-env.sh:export PYTHONPATH=/tmp/test/

spark-defaults.conf:spark.executorEnv.PYTHONPATH=/tmp/test/


/tmp/test/pkg:__init__.pymod.py:  def test(x):
      return x
from the pyspark shell I can import the module pkg.mod without any issues:
$$$ import pkg.mod$$$ print pkg.mod.test(1)1
also the path is correctly set:
$$$ print 
os.environ['PYTHONPATH']/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip:/usr/lib/spark/python/:/tmp/test/
$$$ print sys.path['', '/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip', 
'/usr/lib/spark/python', '/tmp/test/', ... ]

it even is seen by the executors:
$$$ sc.parallelize(range(4)).map(lambda x: 
os.environ['PYTHONPATH']).collect()['/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/',
 
'/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/',
 
'/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/',
 
'/u02/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/']
yet it fails when actually using the module on the executor:$$$ 
sc.parallelize(range(4)).map(pkg.mod.test).collect()...ImportError: No module 
named mod...
any idea how to achieve this? don't want to use the sc.addPyFile as this is big 
packages and they are installed everywhere anyway...
thank you,Antony.


Reply via email to