Hi,
Using Minikube to create a containerised Spark, I can easily use spark
submit below with uber jar file
bin/spark-submit \
--master k8s://$KSERVER \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
--conf spark.kubernetes.container.image=spark:latest \
--conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount
\
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar
This works fine and comes back with Pi is roughly 3.1380356901784507
For Scala this submission is easy because when you write Spark code in
Scala you can bundle your dependencies in the jar file that you submit to
Spark namely
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar
However, when writing Spark code in Python, dependency management becomes
more difficult because each of the Spark executor nodes performing
computations needs to have all of the Python dependencies installed
This is normally resolved by creating a dependency.zip file from
site-packages under Python virtual environment
/usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/site-packages
zip -r ../dependencies.zip .
Then I can use that zip file on prem
spark-submit --master local[4] \
--py-files
local:///usr/src/Python-.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip
\ local:///opt/spark/examples/src/main/python/pi.py
However, how one uses that dependency in minikube.
This is the submission code
bin/spark-submit --verbose \
--master k8s://$KSERVER \
--deploy-mode client \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
--conf spark.kubernetes.container.image=spark:latest \
--conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount
\
--py-files=local:///usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip
\
local:///opt/spark/examples/src/main/python/pi.py
Throwing error
Exception in thread "main" org.apache.spark.SparkException: Failed to get
main class in JAR with error 'File
file:/d4T/hduser/spark-3.1.1-bin-hadoop3.2/ does not exist'. Please
specify one with --class.
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:959)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:486)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
./pyspark-minikube.sh[55]: --name: not found [No such file or directory]
How can one resolve this issue?
Thanks
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.