Well you can try using Environment variable and create a custom script that modifies the --master URL before invoking spark-submit. This script could replace "k8s://" with another identifier of your choice "k8s-armada://") and then modify the SparkSubmit code to handle this custom URL scheme. This may bypass the internal logic within SparkSubmit that restricts --deploy-mode cluster with "k8s://" URLs.
export SPARK_MASTER_URL="k8s://https://$KUBERNETES_MASTER_IP:443" spark-submit-Armada --verbose \ --properties-file ${property_file} \ --deploy-mode cluster \ --name sparkArmada then modify or copy Spark-Submit code to Spark-Submit-Armanda to handle this custom URL for now for test/debugging purposes HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Fri, 7 Feb 2025 at 14:55, Dejan Pejchev <de...@gr-oss.io> wrote: > Thanks for the reply Mich! > > Good point, the issue is that cluster deploy mode is not possible > when master is local ( > https://github.com/apache/spark/blob/9cf98ed41b2de1b44c44f0b4d1273d46761459fe/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L308 > ). > Only way to workaround this scenario would be to edit the SparkSubmit, > which we are trying to avoid because we don't want to touch Spark codebase. > > Do you have an idea how to run in cluster deploy mode and load an external > cluster manager? > > Could it be possible to submit a PR for a change in SparkSubmit? > > Looking forward to your answer! > > On Fri, Feb 7, 2025 at 3:45 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> well that should work but some consideration >> >> When you use >> >> spark-submit --verbose \ >> --properties-file ${property_file} \ >> --master k8s://https://$KUBERNETES_MASTER_IP:443 \ >> * --deploy-mode client \* >> --name sparkBQ \ >> >> *--deploy-mode client *that implies the driver runs on the client >> machine (the machine from which the spark-submit command is executed). >> Normally deployed for debugging and small clusters. >> *--deploy-mode cluster *the driver, which is responsible for >> coordinating the execution of the Spark application, runs *within the >> Kubernetes cluster *as a separate container. >> >> which provides better resource isolation and is more suitable for this >> type of cluster you are using Armada >> >> Anyway you can see how it progresses in debugging mode. >> >> HTH >> >> Dr Mich Talebzadeh, >> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> >> >> On Fri, 7 Feb 2025 at 14:01, Dejan Pejchev <de...@gr-oss.io> wrote: >> >>> I got it to work by running it in client mode and using the `local://*` >>> prefix. My external cluster manager gets injected just fine. >>> >>> On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev <de...@gr-oss.io> wrote: >>> >>>> Hello Spark community! >>>> >>>> My name is Dejan Pejchev, and I am a Software Engineer working at >>>> G-Research, and I am a maintainer of our Kubernetes multi-cluster batch >>>> scheduler called Armada. >>>> >>>> We are trying to build an integration with Spark, where we would like >>>> to use the spark-submit with a master armada://xxxx, which will then submit >>>> the driver and executor jobs to Armada. >>>> >>>> I understood the concept of the ExternalClusterManager and how I can >>>> write and provide a new implementation, but I am not clear how can I extend >>>> Spark to accept it. >>>> >>>> I see that in SparkSubmit.scala there is a check for master URLs and it >>>> fails if it isn't any of local, mesos, k8s and yarn. >>>> >>>> What is the correct approach for my use case? >>>> >>>> Thanks in advance, >>>> Dejan Pejchev >>>> >>>