We are currently using Dataproc on GCP for running our spark workloads, and
i'm planning to move this workload to Kubernetes(GKE).

Here is what is done so far :

Installed Spark using bitnami helm chart:

```

helm repo add bitnami https://charts.bitnami.com/bitnami

helm install spark -f sparkConfig.yaml bitnami/spark -n spark

```

Also, deployed a loadbalancer, yaml used :

```

apiVersion: v1kind: Servicemetadata:
  name: spark-master-lb
  labels:
    app: spark
    component: LoadBalancerspec:
  selector:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: spark
    app.kubernetes.io/name: spark
  ports:
  - name: webui
    port: 8080
    targetPort: 8080
  - name: master
    port: 7077
    targetPort: 7077
  type: LoadBalancer

```

Spark is installed, and the pods have come up.

When i try to do a spark-submit in cluster mode, it gives following error:

```

(base) Karans-MacBook-Pro:fromEdward-jan26 karanalang$
$SPARK_HOME/bin/spark-submit   --master spark://<EXTERNAL_IP>:7077
--deploy-mode cluster   --name spark-on-gke
local:///Users/karanalang/Documents/Technology/0.spark-on-gke/StructuredStream-on-gke.py24/08/26
12:03:26 WARN Utils: Your hostname, Karans-MacBook-Pro.local resolves
to a loopback address: 127.0.0.1; using 10.42.28.138 instead (on
interface en0)24/08/26 12:03:26 WARN Utils: Set SPARK_LOCAL_IP if you
need to bind to another addressWARNING: An illegal reflective access
operation has occurredWARNING: Illegal reflective access by
org.apache.spark.unsafe.Platform
(file:/Users/karanalang/Documents/Technology/spark-3.1.3-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.3.jar)
to constructor java.nio.DirectByteBuffer(long,int)WARNING: Please
consider reporting this to the maintainers of
org.apache.spark.unsafe.PlatformWARNING: Use --illegal-access=warn to
enable warnings of further illegal reflective access
operationsWARNING: All illegal access operations will be denied in a
future release
Exception in thread "main" org.apache.spark.SparkException: Cluster
deploy mode is currently not supported for python applications on
standalone clusters.
    at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:968)
    at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:273)
    at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

```

In client mode, it gives the following error :

24/08/26 12:06:58 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:640)
    at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
    at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)24/08/26
12:06:58 INFO SparkContext: SparkContext already stopped.

Couple of questions :

   1.

   is using the helm chart the correct way to install Apache Spark on
   GKE/k8s (Note - need to install on both GKE and On-prem kubernetes)
   2.

   How to submit pyspark jobs on Spark cluster deployed on GKE (eg. Do I
   need to create a K8s deployment for each spark job ?)

tia !

Here is the stackoverflow link :

https://stackoverflow.com/questions/78915988/unable-to-deploy-pyspark-application-on-gke-spark-installed-using-bitnami-helm

Reply via email to