Wheel is used for package management and setting up your virtual
environment , not used as a library package. To run spark-submit in a
virtual env, use the --py-files option instead. Usage:
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py
files to place on the PYTHONPATH for Python apps.
In other words, you can't run spark-submit in a virtual environment like
a regular python program, since it is NOT a regular python script. But
you can package your python spark project (including all dependency
libs) as a zip or egg file and make it available to spark-submit.
Please note that spark-submit plays the role of a driver. It's only
responsible for submitting jobs to a spark master. The master will
distribute the job content, including all dependencies libs, to
individual worker nodes where the job is executed. By packaging in zip
or egg format, it will be easier to do distribution.
-- ND
On 12/17/20 10:31 AM, Sachit Murarka wrote:
Hi Users
I have a wheel file , while creating it I have mentioned dependencies
in setup.py file.
Now I have 2 virtual envs, 1 was already there . another one I created
just now.
I have switched to new virtual env, I want spark to download the
dependencies while doing spark-submit using wheel.
Could you please help me on this?
It is not downloading dependencies , instead it is pointing to older
version of virtual env and proceeding with the execution of spark job.
Please note I have tried setting the env variables also.
Also I have tried following options as well in spark submit
--conf spark.pyspark.virtualenv.enabled=true --conf
spark.pyspark.virtualenv.type=native --conf
spark.pyspark.virtualenv.requirements=requirements.txt --conf
spark.pyspark.python= /path/to/venv/bin/python3 --conf
spark.pyspark.driver.python=/path/to/venv/bin/python3
This did not help too..
Kind Regards,
Sachit Murarka