Hi Ye, This is the error i get when i don't set the spark.kubernetes.file.upload.path
Any ideas on how to fix this ? ``` Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property. at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299) at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247) at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164) at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:89) at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) at org.apache.spark.deploy.SparkSubmit.org $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin <advance...@gmail.com> wrote: > The configuration of ‘…file.upload.path’ is wrong. it means a distributed > fs path to store your archives/resource/jars temporarily, then distributed > by spark to drivers/executors. > For your cases, you don’t need to set this configuration. > Sent from my iPhone > > On Feb 14, 2023, at 5:43 AM, karan alang <karan.al...@gmail.com> wrote: > > > Hello All, > > I'm trying to run a simple application on GKE (Kubernetes), and it is > failing: > Note : I have spark(bitnami spark chart) installed on GKE using helm > install > > Here is what is done : > 1. created a docker image using Dockerfile > > Dockerfile : > ``` > > FROM python:3.7-slim > > RUN apt-get update && \ > apt-get install -y default-jre && \ > apt-get install -y openjdk-11-jre-headless && \ > apt-get clean > > ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64 > > RUN pip install pyspark > RUN mkdir -p /myexample && chmod 755 /myexample > WORKDIR /myexample > > COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py > > CMD ["pyspark"] > > ``` > Simple pyspark application : > ``` > > from pyspark.sql import SparkSession > spark = > SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate() > > data = [('k1', 123000), ('k2', 234000), ('k3', 456000)] > df = spark.createDataFrame(data, ('id', 'salary')) > > df.show(5, False) > > ``` > > Spark-submit command : > ``` > > spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode > cluster --name pyspark-example --conf > spark.kubernetes.container.image=pyspark-example:0.1 --conf > spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py > ``` > > Error i get : > ``` > > 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file: > /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py > to dest: > /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py... > > Exception in thread "main" org.apache.spark.SparkException: Uploading file > /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py > failed... > > at > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296) > > at > org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270) > > at > org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109) > > at > org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44) > > at > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59) > > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > > at scala.collection.immutable.List.foldLeft(List.scala:89) > > at > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) > > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207) > > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) > > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) > > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > Caused by: org.apache.spark.SparkException: Error uploading file > StructuredStream-on-gke.py > > at > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319) > > at > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292) > > ... 21 more > > Caused by: java.io.IOException: Mkdirs failed to create > /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a > > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317) > > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305) > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987) > > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414) > > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) > > at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369) > > at > org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368) > > at > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316) > > ... 22 more > ``` > > Any ideas on how to fix this & get it to work ? > tia ! > > Pls see the stackoverflow link : > > > https://stackoverflow.com/questions/75441360/running-spark-application-on-gke-failing-on-spark-submit > >