Re: Support for `sparkContext.broadcast` function in Spark Connect?
>From my rough high-level view, there is nothing stopping us from adding broadcast variables to Spark Connect; we essentially have to lift them to the Spark Session. This would not be different from what we're doing for artifact management or what we've done for job cancellation. If you're interested in working on this, I'm happy to guide a bit. Martin On Mon, Oct 14, 2024 at 7:08 PM Deependra Patel wrote: > Hi, > I see that Spark Context methods are not supported in Spark Connect. There > are many common use cases eg. broadcast machine learning model weights to > all executors, so no need to fetch individually. > > This makes migration of workloads to Spark Connect tougher. I know there > are plans to add more & more functionalities to Spark Connect eg. > SparkMLlib. > > My question is: Is there an ETA to support broadcasts in Spark Connect API > too? Or it won't be supported due to how Spark Connect is designed, > maybe separate JVMs/security etc? I can also consider implementing this > based on the above answers and effort. > > Regards, > Deependra >
Re: [DISCUSS] Support spark.ml on Spark Connect
Thank you for your kind response. I will prepare a formal PR for Spark. Niranjan Jayakar 于2024年10月11日周五 22:45写道: > +1 > > On Thu, Oct 10, 2024 at 5:28 PM Xiao Li wrote: > >> Thank you for working on this! >> >> Xiao >> >> Martin Grund 于2024年10月10日周四 03:01写道: >> >>> >>> Hi Bobby, >>> >>> Awesome to see the proposal! I'm very much looking forward to the >>> contributions! >>> >>> Martin >>> >>> On Thu, Oct 10, 2024 at 4:16 AM Ángel >>> wrote: >>> You have my vote (btw, great idea, ML is so sexy nowadays 😉) El jue, 10 oct 2024 a las 3:19, Bobby () escribió: > Hi, > > I'd like to start a discussion about support spark.ml on Connect. > With this feature, Users don't need to change their code to run Spark ML > cases on Connect. > > Please refer to the JIRA: > https://issues.apache.org/jira/browse/SPARK-49907 > > Thx, > Bobby Wang >
[DISCUSS] Migrate or deprecate the Spark Kinesis connector
Hi Spark community, A couple months ago, I raised a PR to upgrade the AWS SDK to v2 for the Spark Kinesis connector: https://github.com/apache/spark/pull/44211. Given that the 4.0 feature freeze is coming, I am following up to check whether we still want to have this change in the upcoming 4.0 release or not? If yes, I could revise and rebase the PR accordingly. Here is the tracking Jira: https://issues.apache.org/jira/browse/SPARK-45720 -- Thanks, Junyu
Re: Inconsistent behavior between SQL and DataFrame API on store assignment policy
Yea looks like a bug, the SQL and DataFrame APIs should be consistent. Please create a JIRA ticket, thanks! On Mon, Oct 14, 2024 at 3:29 PM Manu Zhang wrote: > Hi community, > > With `spark.sql.storeAssignmentPolicy=LEGACY` in Spark 3.5, it's not > allowed to write to DSv2 with insert SQL. However, this can be worked > around with DataFrame API, i.e., `df.writeTo($dsv2Table).append()` > > Is this expected? > > Thanks, > Manu > >
Support for `sparkContext.broadcast` function in Spark Connect?
Hi, I see that Spark Context methods are not supported in Spark Connect. There are many common use cases eg. broadcast machine learning model weights to all executors, so no need to fetch individually. This makes migration of workloads to Spark Connect tougher. I know there are plans to add more & more functionalities to Spark Connect eg. SparkMLlib. My question is: Is there an ETA to support broadcasts in Spark Connect API too? Or it won't be supported due to how Spark Connect is designed, maybe separate JVMs/security etc? I can also consider implementing this based on the above answers and effort. Regards, Deependra