Re: [VOTE][RESULT] Publish additional Spark distribution with Spark Connect enabled

2025-02-07 Thread Jules Damji
This is good news. :-) Thanks for the support. Excuse the thumb typos On Fri, 07 Feb 2025 at 6:25 PM, Wenchen Fan wrote: > Hi all, > > The vote for "Publish additional Spark distribution with Spark Connect > enabled" passes with 22 +1s (13 binding +1s) > > (* = binding) > +1: > - Mridul Murali

[VOTE][RESULT] Publish additional Spark distribution with Spark Connect enabled

2025-02-07 Thread Wenchen Fan
Hi all, The vote for "Publish additional Spark distribution with Spark Connect enabled" passes with 22 +1s (13 binding +1s) (* = binding) +1: - Mridul Muralidharan * - Hyukjin Kwon * - Jungtaek Lim - Xiao Li * - DB Tsai * - Sakthi - Gengliang Wang * - L. C. Hsieh * - Yang Jie * - Max Gekk * - Yum

unsubscribe

2025-02-07 Thread 김병찬
unsubscribe

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Jules Damji
Yes, if this becomes a need that surfaces time and again, then it’s worthwhile to start a broader discussion in a manner of high-level proposal, which could trigger favorable discussion leading to next steps. CheersJules —Sent from my iPhonePardon the dumb thumb typos :)On Feb 7, 2025, at 8:00 AM,

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Well, everything is possible. Please initiate a discussion on the matter of a proposal to "Create a pluggable cluster manager" and put it to the community. See some examples here https://lists.apache.org/list.html?dev@spark.apache.org HTH Dr Mich Talebzadeh, Architect | Data Science | Financial

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Agreed, If the goal is to make Spark truly pluggable, the spark-submit tool itself should be more flexible in handling different cluster managers and their specific requirements. 1. Back in the days, Spark's initial development focused on a limited set of cluster managers (Standalone, YARN).

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
This External Cluster Manager is an amazing concept and I really like the separation. Would it be possible to include a broader group and discuss an approach on how to make Spark more pluggable? It is a bit far fetched but we would be very much interested in working on this if this resonates well

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread George J
To me, this seems like a gap in the "pluggable cluster manager" implementation. What is the value of making cluster managers pluggable, if spark-submit doesn't accept jobs on those cluster managers? It seems to me, for pluggable cluster managers to work, you would want some parts of spark-submit

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Well you can try using Environment variable and create a custom script that modifies the --master URL before invoking spark-submit. This script could replace "k8s://" with another identifier of your choice "k8s-armada://") and then modify the SparkSubmit code to handle this custom URL scheme. This

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
Thanks for the reply Mich! Good point, the issue is that cluster deploy mode is not possible when master is local ( https://github.com/apache/spark/blob/9cf98ed41b2de1b44c44f0b4d1273d46761459fe/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L308 ). Only way to workaround this scenar

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
well that should work but some consideration When you use spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ * --deploy-mode client \* --name sparkBQ \ *--deploy-mode client *that im

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
I got it to work by running it in client mode and using the `local://*` prefix. My external cluster manager gets injected just fine. On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev wrote: > Hello Spark community! > > My name is Dejan Pejchev, and I am a Software Engineer working at > G-Research, a

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
Hi Mich, Yes, the project is fully open-source and adopted by enterprises who do very large scale batch scheduling and data processing. The GitHub repository is https://github.com/armadaproject/armada and the Armada Operator is the simplest way to install it https://github.com/armadaproject/armad