Re: Extending Spark with a custom ExternalClusterManager

Dejan Pejchev Fri, 07 Feb 2025 06:56:08 -0800

Thanks for the reply Mich!

Good point, the issue is that cluster deploy mode is not possible
when master is local (
https://github.com/apache/spark/blob/9cf98ed41b2de1b44c44f0b4d1273d46761459fe/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L308
).
Only way to workaround this scenario would be to edit the SparkSubmit,
which we are trying to avoid because we don't want to touch Spark codebase.


Do you have an idea how to run in cluster deploy mode and load an external
cluster manager?

Could it be possible to submit a PR for a change in SparkSubmit?

Looking forward to your answer!

On Fri, Feb 7, 2025 at 3:45 PM Mich Talebzadeh <[email protected]>
wrote:

> well that should work but some consideration
>
> When you use
>
>          spark-submit --verbose \
>            --properties-file ${property_file} \
>            --master k8s://https://$KUBERNETES_MASTER_IP:443 \
> *           --deploy-mode client \*
>            --name sparkBQ \
>
> *--deploy-mode client *that implies the driver runs on the client machine
> (the machine from which the spark-submit command is executed). Normally
> deployed for debugging and small clusters.
> *--deploy-mode cluster *the driver, which is responsible for coordinating
> the execution of the Spark application, runs *within the Kubernetes
> cluster *as a separate container.
>
> which provides better resource isolation and is more suitable for this
> type of cluster you are using Armada
>
> Anyway you can see how it progresses in debugging mode.
>
> HTH
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Fri, 7 Feb 2025 at 14:01, Dejan Pejchev <[email protected]> wrote:
>
>> I got it to work by running it in client mode and using the `local://*`
>> prefix. My external cluster manager gets injected just fine.
>>
>> On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev <[email protected]> wrote:
>>
>>> Hello Spark community!
>>>
>>> My name is Dejan Pejchev, and I am a Software Engineer working at
>>> G-Research, and I am a maintainer of our Kubernetes multi-cluster batch
>>> scheduler called Armada.
>>>
>>> We are trying to build an integration with Spark, where we would like to
>>> use the spark-submit with a master armada://xxxx, which will then submit
>>> the driver and executor jobs to Armada.
>>>
>>> I understood the concept of the ExternalClusterManager and how I can
>>> write and provide a new implementation, but I am not clear how can I extend
>>> Spark to accept it.
>>>
>>> I see that in SparkSubmit.scala there is a check for master URLs and it
>>> fails if it isn't any of local, mesos, k8s and yarn.
>>>
>>> What is the correct approach for my use case?
>>>
>>> Thanks in advance,
>>> Dejan Pejchev
>>>
>>

Re: Extending Spark with a custom ExternalClusterManager

Reply via email to