Hi all,
The vote for "Publish additional Spark distribution with Spark Connect
enabled" passes with 22 +1s (13 binding +1s)
(* = binding)
+1:
- Mridul Muralidharan *
- Hyukjin Kwon *
- Jungtaek Lim
- Xiao Li *
- DB Tsai *
- Sakthi
- Gengliang Wang *
- L. C. Hsieh *
- Yang Jie *
- Max Gekk *
- Yum
This is good news. :-) Thanks for the support.
Excuse the thumb typos
On Fri, 07 Feb 2025 at 6:25 PM, Wenchen Fan wrote:
> Hi all,
>
> The vote for "Publish additional Spark distribution with Spark Connect
> enabled" passes with 22 +1s (13 binding +1s)
>
> (* = binding)
> +1:
> - Mridul Murali
unsubscribe
Hi Mich,
Yes, the project is fully open-source and adopted by enterprises who do
very large scale batch scheduling and data processing.
The GitHub repository is https://github.com/armadaproject/armada and the
Armada Operator is the simplest way to install it
https://github.com/armadaproject/armad
well that should work but some consideration
When you use
spark-submit --verbose \
--properties-file ${property_file} \
--master k8s://https://$KUBERNETES_MASTER_IP:443 \
* --deploy-mode client \*
--name sparkBQ \
*--deploy-mode client *that im
I got it to work by running it in client mode and using the `local://*`
prefix. My external cluster manager gets injected just fine.
On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev wrote:
> Hello Spark community!
>
> My name is Dejan Pejchev, and I am a Software Engineer working at
> G-Research, a
Thanks for the reply Mich!
Good point, the issue is that cluster deploy mode is not possible
when master is local (
https://github.com/apache/spark/blob/9cf98ed41b2de1b44c44f0b4d1273d46761459fe/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L308
).
Only way to workaround this scenar
To me, this seems like a gap in the "pluggable cluster manager"
implementation.
What is the value of making cluster managers pluggable, if spark-submit
doesn't accept jobs on those cluster managers?
It seems to me, for pluggable cluster managers to work, you would want some
parts of spark-submit
Well you can try using Environment variable and create a custom script
that modifies the --master URL before invoking spark-submit. This script
could replace "k8s://" with another identifier of your choice
"k8s-armada://") and then modify the SparkSubmit code to handle this custom
URL scheme. This
Agreed, If the goal is to make Spark truly pluggable, the spark-submit tool
itself should be more flexible in handling different cluster managers and
their specific requirements.
1. Back in the days, Spark's initial development focused on a limited
set of cluster managers (Standalone, YARN).
This External Cluster Manager is an amazing concept and I really like the
separation.
Would it be possible to include a broader group and discuss an approach on
how to make Spark more pluggable? It is a bit far fetched but we would be
very much interested in working on this if this resonates well
Well, everything is possible. Please initiate a discussion on the matter of
a proposal to "Create a pluggable cluster manager" and put it to the
community.
See some examples here
https://lists.apache.org/list.html?dev@spark.apache.org
HTH
Dr Mich Talebzadeh,
Architect | Data Science | Financial
Yes, if this becomes a need that surfaces time and again, then it’s worthwhile to start a broader discussion in a manner of high-level proposal, which could trigger favorable discussion leading to next steps. CheersJules —Sent from my iPhonePardon the dumb thumb typos :)On Feb 7, 2025, at 8:00 AM,
13 matches
Mail list logo