Wheel is used for package management and setting up your virtual environment , not used as a library package.  To run spark-submit in a virtual env, use the --py-files option instead.  Usage:

--py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.

In other words, you can't run spark-submit in a virtual environment like a regular python program, since it is NOT a regular python script.  But you can package your python spark project (including all dependency libs) as a zip or egg file and make it available to spark-submit.  Please note that spark-submit plays the role of a driver.  It's only responsible for submitting jobs to a spark master.  The master will distribute the job content, including all dependencies libs, to individual worker nodes where the job is executed.  By packaging in zip or egg format, it will be easier to do distribution.

-- ND

On 12/17/20 10:31 AM, Sachit Murarka wrote:
Hi Users

I have a wheel file , while creating it I have mentioned dependencies in setup.py file. Now I have 2 virtual envs, 1 was already there . another one I created just now.

I have switched to new virtual env, I want spark to download the dependencies while doing spark-submit using wheel.

Could you please help me on this?

It is not downloading dependencies , instead it is pointing to older version of  virtual env and proceeding with the execution of spark job.

Please note I have tried setting the env variables also.
Also I have tried following options as well in spark submit

--conf spark.pyspark.virtualenv.enabled=true  --conf spark.pyspark.virtualenv.type=native --conf spark.pyspark.virtualenv.requirements=requirements.txt  --conf spark.pyspark.python= /path/to/venv/bin/python3 --conf spark.pyspark.driver.python=/path/to/venv/bin/python3

This did not help too..

Kind Regards,
Sachit Murarka

Reply via email to