Re: [CONNECT] Why Can't We Specify Cluster Deploy Mode for Spark Connect?

Nagatomi Yasukazu Mon, 09 Sep 2024 09:54:36 -0700

I apologize if my previous explanation was unclear, and I realize I didn’t
provide enough context for my question.


The reason I want to submit a Spark application to a Kubernetes cluster
using the Spark Operator is that I want to use Kubernetes as the Cluster
Manager, rather than Standalone mode. This would allow the Spark Connect
Driver and Executors to run on different nodes within the Kubernetes
cluster.

I understand that it is currently possible to launch Spark Connect by
setting the Cluster Manager to Standalone. However, in that case, the
Driver and Executors run on the same node, which I believe would not scale
efficiently.

Therefore, I am considering specifying Kubernetes (specifically the
Kubernetes API) as the Cluster Manager to dynamically distribute and
schedule the Driver and multiple Executor Pods across all nodes in the
Kubernetes cluster.

With the Spark Operator, it is easy to specify Kubernetes as the Cluster
Manager, but the Spark Operator does not allow the use of the "client"
deploy mode (i.e., the deploy mode will be "cluster"). On the other hand,
Spark Connect does not support the "cluster" deploy mode, leading to a
deadlock between the two specifications.

That is why I wanted to understand the reason why Spark Connect does not
allow the "cluster" deploy mode, and this was the main point of my original
question.

2024年9月10日(火) 0:29 Prabodh Agarwal <prabodh1...@gmail.com>:

> Oh. This issue is pretty straightforward to solve actually. Particularly,
> in spark-3.5.2.
>
> Just download the `spark-connect` maven jar and place it in
> `$SPARK_HOME/jars`. Then rebuild the docker image. I saw that I had posted
> a comment on this Jira as well. I could fix this up for standalone cluster
> at least this way.
>
> On Mon, Sep 9, 2024 at 7:04 PM Nagatomi Yasukazu <yassan0...@gmail.com>
> wrote:
>
>> Hi Prabodh,
>>
>> Thank you for your response.
>>
>> As you can see from the following JIRA issue, it is possible to run the
>> Spark Connect Driver on Kubernetes:
>>
>> https://issues.apache.org/jira/browse/SPARK-45769
>>
>> However, this issue describes a problem that occurs when the Driver and
>> Executors are running on different nodes. This could potentially be the
>> reason why only Standalone mode is currently supported, but I am not
>> certain about it.
>>
>> Thank you for your attention.
>>
>>
>> 2024年9月9日(月) 12:40 Prabodh Agarwal <prabodh1...@gmail.com>:
>>
>>> My 2 cents regarding my experience with using spark connect in cluster
>>> mode.
>>>
>>> 1. Create a spark cluster of 2 or more nodes. Make 1 node as master &
>>> other nodes as workers. Deploy spark connect pointing to the master node.
>>> This works well. The approach is not well documented, but I could figure
>>> it out by hit-and-trial.
>>> 2. In k8s, by default; we can actually get the executors to run on
>>> kubernetes itself. That is pretty straightforward, but the driver continues
>>> to run on a local machine. But yeah, I agree as well, making the driver to
>>> run on k8s itself would be slick.
>>>
>>> Thank you.
>>>
>>>
>>> On Mon, Sep 9, 2024 at 6:17 AM Nagatomi Yasukazu <yassan0...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Why is it not possible to specify cluster as the deploy mode for Spark
>>>> Connect?
>>>>
>>>> As discussed in the following thread, it appears that there is an
>>>> "arbitrary decision" within spark-submit that "Cluster mode is not
>>>> applicable" to Spark Connect.
>>>>
>>>> GitHub Issue Comment:
>>>>
>>>> https://github.com/kubeflow/spark-operator/issues/1801#issuecomment-2000494607
>>>>
>>>> > This will circumvent the submission error you may have gotten if you
>>>> tried to just run the SparkConnectServer directly. From my investigation,
>>>> that looks to be an arbitrary decision within spark-submit that Cluster
>>>> mode is "not applicable" to SparkConnect. Which is sort of true except when
>>>> using this operator :)
>>>>
>>>> I have reviewed the following commit and pull request, but I could not
>>>> find any discussion or reason explaining why cluster mode is not available:
>>>>
>>>> Related Commit:
>>>>
>>>> https://github.com/apache/spark/commit/11260310f65e1a30f6b00b380350e414609c5fd4
>>>>
>>>> Related Pull Request:
>>>> https://github.com/apache/spark/pull/39928
>>>>
>>>> This restriction poses a significant obstacle when trying to use Spark
>>>> Connect with the Spark Operator. If there is a technical reason for this, I
>>>> would like to know more about it. Additionally, if this issue is being
>>>> tracked on JIRA or elsewhere, I would appreciate it if you could provide a
>>>> link.
>>>>
>>>> Thank you in advance.
>>>>
>>>> Best regards,
>>>> Yasukazu Nagatomi
>>>>
>>>

Re: [CONNECT] Why Can't We Specify Cluster Deploy Mode for Spark Connect?

Reply via email to