Hi Christian, The PYSPARK_PYTHON environment variable specifies the python executable to use for pyspark. You can put the path to a virtualenv's python executable and it will work fine. Remember you have to have the same installation at the same path on each of your cluster nodes for pyspark to work. If you're creating the spark context yourself in a python application, you can use os.environ['PYSPARK_PYTHON'] = sys.executable before creating your spark context.
Hope that helps, Bryn On Wed, Mar 5, 2014 at 4:54 AM, Christian <chri...@gmail.com> wrote: > Hello, > > I usually create different python virtual environments for different > projects to avoid version conflicts and skip the requirement to be root to > install libs. > > How can I specify to pyspark to activate a virtual environment before > executing the tasks ? > > Further info on virtual envs: > http://virtualenv.readthedocs.org/en/latest/virtualenv.html > > Thanks in advance, > Christian >