Hi,
Using Minikube to create a containerised Spark, I can easily use spark submit below with uber jar file bin/spark-submit \ --master k8s://$KSERVER \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.namespace=spark \ --conf spark.kubernetes.driver.pod.name=spark-pi-driver \ --conf spark.kubernetes.container.image=spark:latest \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount \ local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar This works fine and comes back with Pi is roughly 3.1380356901784507 For Scala this submission is easy because when you write Spark code in Scala you can bundle your dependencies in the jar file that you submit to Spark namely local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar However, when writing Spark code in Python, dependency management becomes more difficult because each of the Spark executor nodes performing computations needs to have all of the Python dependencies installed This is normally resolved by creating a dependency.zip file from site-packages under Python virtual environment /usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/site-packages zip -r ../dependencies.zip . Then I can use that zip file on prem spark-submit --master local[4] \ --py-files local:///usr/src/Python-.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip \ local:///opt/spark/examples/src/main/python/pi.py However, how one uses that dependency in minikube. This is the submission code bin/spark-submit --verbose \ --master k8s://$KSERVER \ --deploy-mode client \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=2 \ --conf spark.kubernetes.namespace=spark \ --conf spark.kubernetes.driver.pod.name=spark-pi-driver \ --conf spark.kubernetes.container.image=spark:latest \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount \ --py-files=local:///usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip \ local:///opt/spark/examples/src/main/python/pi.py Throwing error Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/d4T/hduser/spark-3.1.1-bin-hadoop3.2/ does not exist'. Please specify one with --class. at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:959) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:486) at org.apache.spark.deploy.SparkSubmit.org $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ./pyspark-minikube.sh[55]: --name: not found [No such file or directory] How can one resolve this issue? Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.