[CONNECT] Question on Spark Connect in Cluster Deply Mode

Yasukazu Nagatomi Mon, 10 Mar 2025 03:17:34 -0700

Hello everyone,

I noticed that a recent PR appears to disable the start of Spark Connect
when the deployment mode is set to "cluster".


PR: [SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect
server by HyukjinKwon · Pull Request #39928 · apache/spark · GitHub
https://github.com/apache/spark/pull/39928

This behavior still exists on the current master branch.
https://github.com/apache/spark/blob/f27464f269e4459a92eded8c4baa6dafcec59e34/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L314-L315

```
      case (_, CLUSTER) if isConnectServer(args.mainClass) =>
        error("Cluster deploy mode is not applicable to Spark Connect
server.")
```

I understand that Spark Connect is scheduled to reach GA in Spark v4.0.
However, with the current design, Spark Connect cannot distribute
processing across multiple nodes in cluster mode. This may limit its use in
large-scale processing, especially when using Cluster Managers like
Kubernetes.

Could you please explain if there are any constraints that require this
behavior to remain?

Thank you for your time.

Best regards,
Yasukazu

[CONNECT] Question on Spark Connect in Cluster Deply Mode

Reply via email to