Dear Spark Community, I've been exploring the capabilities of the Spark Connect Server and encountered an issue when trying to launch it in a cluster deploy mode with Kubernetes as the master.
While initiating the `start-connect-server.sh` script with the `--conf` parameter for `spark.master` and `spark.submit.deployMode`, I was met with an error message: ``` Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode is not applicable to Spark Connect server. ``` This error message can be traced back to Spark's source code here: https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L307 Given my observations, I'm curious about the Spark Connect Server roadmap: Is there a plan or current conversation to enable Kubernetes as a master in Spark Connect Server's cluster deploy mode? I have tried to gather information from existing JIRA tickets, but have not been able to get a definitive answer: https://issues.apache.org/jira/browse/SPARK-42730 https://issues.apache.org/jira/browse/SPARK-39375 https://issues.apache.org/jira/browse/SPARK-44117 Any thoughts, updates, or references to similar conversations or initiatives would be greatly appreciated. Thank you for your time and expertise! Best regards, Yasukazu 2023年9月5日(火) 12:09 Nagatomi Yasukazu <yassan0...@gmail.com>: > Hello Mich, > Thank you for your questions. Here are my responses: > > > 1. What investigation have you done to show that it is running in local > mode? > > I have verified through the History Server's Environment tab that: > - "spark.master" is set to local[*] > - "spark.app.id" begins with local-xxx > - "spark.submit.deployMode" is set to local > > > > 2. who has configured this kubernetes cluster? Is it supplied by a cloud > vendor? > > Our Kubernetes cluster was set up in an on-prem environment using RKE2( > https://docs.rke2.io/ ). > > > > 3. Confirm that you have configured Spark Connect Server correctly for > cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes) > and other relevant Spark configurations in your Spark job submission. > > Based on the Spark Connect documentation I've read, there doesn't seem to > be any specific settings for cluster mode related to the Spark Connect > Server. > > Configuration - Spark 3.4.1 Documentation > https://spark.apache.org/docs/3.4.1/configuration.html#spark-connect > > Quickstart: Spark Connect — PySpark 3.4.1 documentation > > https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html > > Spark Connect Overview - Spark 3.4.1 Documentation > https://spark.apache.org/docs/latest/spark-connect-overview.html > > The documentation only suggests running ./sbin/start-connect-server.sh > --packages org.apache.spark:spark-connect_2.12:3.4.0, leaving me at a loss. > > > > 4. Can you provide a full spark submit command > > Given the nature of Spark Connect, I don't use the spark-submit command. > Instead, as per the documentation, I can execute workloads using only a > Python script. For the Spark Connect Server, I have a Kubernetes manifest > executing "/opt.spark/sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.4.0". > > > > 5. Make sure that the Python client script connecting to Spark Connect > Server specifies the cluster mode explicitly, like using --master or > --deploy-mode flags when creating a SparkSession. > > The Spark Connect Server operates as a Driver, so it isn't possible to > specify the --master or --deploy-mode flags in the Python client script. If > I try, I encounter a RuntimeError. > > like this: > RuntimeError: Spark master cannot be configured with Spark Connect server; > however, found URL for Spark Connect [sc://.../] > > > > 6. Ensure that you have allocated the necessary resources (CPU, memory > etc) to Spark Connect Server when running it on Kubernetes. > > Resources are ample, so that shouldn't be the problem. > > > > 7. Review the environment variables and configurations you have set, > including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables > are not conflicting with > > I'm unsure if SPARK_NO_DAEMONIZE=1 conflicts with cluster mode settings. > But without it, the process goes to the background when executing > start-connect-server.sh, causing the Pod to terminate prematurely. > > > > 8. Are you using the correct spark client version that is fully > compatible with your spark on the server? > > Yes, I have verified that without using Spark Connect (e.g., using Spark > Operator), Spark applications run as expected. > > > 9. check the kubernetes error logs > > The Kubernetes logs don't show any errors, and jobs are running in local > mode. > > > > 10. Insufficient resources can lead to the application running in local > mode > > I wasn't aware that insufficient resources could lead to local mode > execution. Thank you for pointing it out. > > > Best regards, > Yasukazu > > 2023年9月5日(火) 1:28 Mich Talebzadeh <mich.talebza...@gmail.com>: > >> >> personally I have not used this feature myself. However, some points >> >> >> 1. What investigation have you done to show that it is running in >> local mode? >> 2. who has configured this kubernetes cluster? Is it supplied by a >> cloud vendor? >> 3. Confirm that you have configured Spark Connect Server correctly >> for cluster mode. Make sure you specify the cluster manager (e.g., >> Kubernetes) and other relevant Spark configurations in your Spark job >> submission. >> 4. Can you provide a full spark submit command >> 5. Make sure that the Python client script connecting to Spark >> Connect Server specifies the cluster mode explicitly, like using >> --master or --deploy-mode flags when creating a SparkSession. >> 6. Ensure that you have allocated the necessary resources (CPU, >> memory etc) to Spark Connect Server when running it on Kubernetes. >> 7. Review the environment variables and configurations you have set, >> including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these >> variables are not conflicting with cluster mode settings. >> 8. Are you using the correct spark client version that is fully >> compatible with your spark on the server? >> 9. check the kubernetes error logs >> 10. Insufficient resources can lead to the application running in >> local mode >> >> HTH >> >> Mich Talebzadeh, >> Distinguished Technologist, Solutions Architect & Engineer >> London >> United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Mon, 4 Sept 2023 at 04:57, Nagatomi Yasukazu <yassan0...@gmail.com> >> wrote: >> >>> Hi Cley, >>> >>> Thank you for taking the time to respond to my query. Your insights on >>> Spark cluster deployment are much appreciated. >>> >>> However, I'd like to clarify that my specific challenge is related to >>> running the Spark Connect Server on Kubernetes in Cluster Mode. While I >>> understand the general deployment strategies for Spark on Kubernetes, I am >>> seeking guidance particularly on the Spark Connect Server aspect. >>> >>> cf. Spark Connect Overview - Spark 3.4.1 Documentation >>> https://spark.apache.org/docs/latest/spark-connect-overview.html >>> >>> To reiterate, when I connect from an external Python client and execute >>> scripts, the server operates in Local Mode instead of the expected >>> Kubernetes Cluster Mode (with master as k8s://... and deploy-mode set to >>> cluster). >>> >>> If I've misunderstood your initial response and it was indeed related to >>> Spark Connect, I sincerely apologize for the oversight. In that case, could >>> you please expand a bit on the Spark Connect-specific aspects? >>> >>> Do you, or anyone else in the community, have experience with this >>> specific setup or encountered a similar issue with Spark Connect Server on >>> Kubernetes? Any targeted advice or guidance would be invaluable. >>> >>> Thank you again for your time and help. >>> >>> Best regards, >>> Yasukazu >>> >>> 2023年9月4日(月) 0:23 Cleyson Barros <euroc...@gmail.com>: >>> >>>> Hi Nagatomi, >>>> Use Apache imagers, then run your master node, then start your many >>>> slavers. You can add a command line in the docker files to call for the >>>> master using the docker container names in your service composition if you >>>> wish to run 2 masters active and standby follow the instructions in the >>>> Apache docs to do this configuration, the recipe is the same except when >>>> you start the masters and how you expect the behaviour of your cluster. >>>> I hope it helps. >>>> Have a nice day :) >>>> Cley >>>> >>>> Nagatomi Yasukazu <yassan0...@gmail.com> escreveu no dia sábado, >>>> 2/09/2023 à(s) 15:37: >>>> >>>>> Hello Apache Spark community, >>>>> >>>>> I'm currently trying to run Spark Connect Server on Kubernetes in >>>>> Cluster Mode and facing some challenges. Any guidance or hints would be >>>>> greatly appreciated. >>>>> >>>>> ## Environment: >>>>> Apache Spark version: 3.4.1 >>>>> Kubernetes version: 1.23 >>>>> Command executed: >>>>> /opt/spark/sbin/start-connect-server.sh \ >>>>> --packages >>>>> org.apache.spark:spark-connect_2.13:3.4.1,org.apache.iceberg:iceberg-spark-runtime-3.4_2.13:1.3.1... >>>>> Note that I'm running it with the environment variable >>>>> SPARK_NO_DAEMONIZE=1. >>>>> >>>>> ## Issue: >>>>> When I connect from an external Python client and run scripts, it >>>>> operates in Local Mode instead of the expected Cluster Mode. >>>>> >>>>> ## Expected Behavior: >>>>> When connecting from a Python client to the Spark Connect Server, I >>>>> expect it to run in Cluster Mode. >>>>> >>>>> If anyone has any insights, advice, or has faced a similar issue, I'd >>>>> be grateful for your feedback. >>>>> Thank you in advance. >>>>> >>>>> >>>>>