Re: Extending Spark with a custom ExternalClusterManager

2025-02-19 Thread Enrico Minack
Hi devs, Let me pull some spark-submit developers into this discussion. @dongjoon-hyun @HyukjinKwon @cloud-fan What are your thoughts on making spark-submit fully and generically support ExternalClusterManager implementations? The current situation is that the only way to submit a Spark job vi

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Jules Damji
Yes, if this becomes a need that surfaces time and again, then it’s worthwhile to start a broader discussion in a manner of high-level proposal, which could trigger favorable discussion leading to next steps. CheersJules —Sent from my iPhonePardon the dumb thumb typos :)On Feb 7, 2025, at 8:00 AM,

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Well, everything is possible. Please initiate a discussion on the matter of a proposal to "Create a pluggable cluster manager" and put it to the community. See some examples here https://lists.apache.org/list.html?dev@spark.apache.org HTH Dr Mich Talebzadeh, Architect | Data Science | Financial

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Agreed, If the goal is to make Spark truly pluggable, the spark-submit tool itself should be more flexible in handling different cluster managers and their specific requirements. 1. Back in the days, Spark's initial development focused on a limited set of cluster managers (Standalone, YARN).

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
This External Cluster Manager is an amazing concept and I really like the separation. Would it be possible to include a broader group and discuss an approach on how to make Spark more pluggable? It is a bit far fetched but we would be very much interested in working on this if this resonates well

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread George J
To me, this seems like a gap in the "pluggable cluster manager" implementation. What is the value of making cluster managers pluggable, if spark-submit doesn't accept jobs on those cluster managers? It seems to me, for pluggable cluster managers to work, you would want some parts of spark-submit

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Well you can try using Environment variable and create a custom script that modifies the --master URL before invoking spark-submit. This script could replace "k8s://" with another identifier of your choice "k8s-armada://") and then modify the SparkSubmit code to handle this custom URL scheme. This

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
Thanks for the reply Mich! Good point, the issue is that cluster deploy mode is not possible when master is local ( https://github.com/apache/spark/blob/9cf98ed41b2de1b44c44f0b4d1273d46761459fe/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L308 ). Only way to workaround this scenar

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
well that should work but some consideration When you use spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ * --deploy-mode client \* --name sparkBQ \ *--deploy-mode client *that im

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
I got it to work by running it in client mode and using the `local://*` prefix. My external cluster manager gets injected just fine. On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev wrote: > Hello Spark community! > > My name is Dejan Pejchev, and I am a Software Engineer working at > G-Research, a

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
Hi Mich, Yes, the project is fully open-source and adopted by enterprises who do very large scale batch scheduling and data processing. The GitHub repository is https://github.com/armadaproject/armada and the Armada Operator is the simplest way to install it https://github.com/armadaproject/armad

Re: Extending Spark with a custom ExternalClusterManager

2025-02-06 Thread Mich Talebzadeh
Hi, Is this the correct link to this open source product? Armada - how to run millions of batch jobs over thousands of compute nodes using Kubernetes | G-Research I am fami

Extending Spark with a custom ExternalClusterManager

2025-02-06 Thread Dejan Pejchev
Hello Spark community! My name is Dejan Pejchev, and I am a Software Engineer working at G-Research, and I am a maintainer of our Kubernetes multi-cluster batch scheduler called Armada. We are trying to build an integration with Spark, where we would like to use the spark-submit with a master arm