Hello everyone, I noticed that a recent PR appears to disable the start of Spark Connect when the deployment mode is set to "cluster".
PR: [SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect server by HyukjinKwon · Pull Request #39928 · apache/spark · GitHub https://github.com/apache/spark/pull/39928 This behavior still exists on the current master branch. https://github.com/apache/spark/blob/f27464f269e4459a92eded8c4baa6dafcec59e34/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L314-L315 ``` case (_, CLUSTER) if isConnectServer(args.mainClass) => error("Cluster deploy mode is not applicable to Spark Connect server.") ``` I understand that Spark Connect is scheduled to reach GA in Spark v4.0. However, with the current design, Spark Connect cannot distribute processing across multiple nodes in cluster mode. This may limit its use in large-scale processing, especially when using Cluster Managers like Kubernetes. Could you please explain if there are any constraints that require this behavior to remain? Thank you for your time. Best regards, Yasukazu