Hi,
I have an issue with my PySpark running in Kubernetes (testing on
minikube).
The project is zipped as DSBQ.zip and passed to spark-submit with the
zipped file on HDFS (pod can read it).
Zipped file DSBQ.zip zipped at root and has the following structure:
One of the py-files called DSBQ.zip
Thanks Julien for further info.
I have been working a few day fee time on Pyspark on Kubernetes both on
minikube and Google Cloud Platform (GCP) that provide Spark on Google
Kubernetes Engine (GKE). Frankly my work on k8s has been a bit
disappointing.
In GCP the only available and supported doc
Hello, I’m writing a custom Spark catalyst Expression with custom codegen,
but it seems that Spark (3.0.0) doesn’t want to generate code, and falls
back to interpreted mode.
I created my SparkSession with spark.sql.codegen.factoryMode=CODEGEN_ONLY
and spark.sql.codegen.fallback=false, hoping that
Hi,
Good question !
It is very dependent to your jobs and developer team.
Things that mostly differ in my view is :
1/ data locality & fast-read
If your data are stored in an HDFS cluster (not HCFS) and your Spark
compute nodes are allowed to run on the Hadoop nodes, then definitely use
Yarn to b