[ https://issues.apache.org/jira/browse/SPARK-51705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940554#comment-17940554 ]
Jayadeep Jayaraman edited comment on SPARK-51705 at 4/7/25 5:54 AM: -------------------------------------------------------------------- Hi [~grundprinzip-db] - I have created this request based on a discussion in the dev mailing list for supporting `Broadcast` API in Spark Connect. Below is a high level sketch of the changes that I am thinking. Please advice * Introduce new entry in [base.proto|https://github.com/apache/spark/blob/master/connect-examples/server-library-example/common/src/main/protobuf/base.proto] file to broadcast variables, models etc from client to server side. The rpc method and the request response structure is mentioned. {code:java} rpc Broadcast(stream BroadcastRequest) returns (BroadcastResponse) {}{code} * Introduce `broadcast` method in [SparkSession|https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/SparkSession.scala] API * Implement `broadcast` method on the client side [here|https://github.com/apache/spark/blob/master/sql/connect/common/src/main/scala/org/apache/spark/sql/connect/SparkSession.scala] and [here|https://github.com/apache/spark/blob/master/sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala] * Introduce a new `BroadcastManager` class similar to `ArtifactManager` class to send the request over to the server * Introduce broadcast method on the server side in [SparkConnectService|https://github.com/apache/spark/blob/295d37fad3b67ac0c73629d5eaebb3baefaeea7e/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala] which will implement a new class `SparkConnectBroadcastHandler` * `SparkConnectBroadcastHandler` will implement the broadcast method which will internally call `SparkContext.broadcast` in class [SparkConnect|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala] was (Author: jjayadeep): Hi [~grundprinzip-db] - I have created this request based on a discussion in the dev mailing list for supporting `Broadcast` API in Spark Connect. Below is a high level sketch of the changes that I am thinking. Please advice * Introduce new entry in Base.proto file for Broadcast {code:java} rpc Broadcast(stream BroadcastRequest) returns (BroadcastResponse) {}{code} * Introduce `broadcast` method in [SparkSession|https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/SparkSession.scala] API * Implement `broadcast` method on the client side [here|https://github.com/apache/spark/blob/master/sql/connect/common/src/main/scala/org/apache/spark/sql/connect/SparkSession.scala] and [here|https://github.com/apache/spark/blob/master/sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala] * Introduce a new `BroadcastManager` class similar to `ArtifactManager` class to send the request over to the server * Introduce broadcast method on the server side in [SparkConnectService|https://github.com/apache/spark/blob/295d37fad3b67ac0c73629d5eaebb3baefaeea7e/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala] which will implement a new class `SparkConnectBroadcastHandler` * `SparkConnectBroadcastHandler` will implement the broadcast method which will internally call `SparkContext.broadcast` in class [SparkConnect|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala] > [CONNECT] Support sc.broadcast over Spark Connect > ------------------------------------------------- > > Key: SPARK-51705 > URL: https://issues.apache.org/jira/browse/SPARK-51705 > Project: Spark > Issue Type: Improvement > Components: Connect > Affects Versions: 4.0.0 > Reporter: Jayadeep Jayaraman > Priority: Major > > Support broadcasting of variables over spark connect. This is quite useful in > shipping small/medium sized arbitrary ML models -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org