Re: Support for `sparkContext.broadcast` function in Spark Connect?

2024-10-14 Thread Martin Grund
>From my rough high-level view, there is nothing stopping us from adding broadcast variables to Spark Connect; we essentially have to lift them to the Spark Session. This would not be different from what we're doing for artifact management or what we've done for job cancellation. If you're interes

Re: [DISCUSS] Support spark.ml on Spark Connect

2024-10-14 Thread Bobby
Thank you for your kind response. I will prepare a formal PR for Spark. Niranjan Jayakar 于2024年10月11日周五 22:45写道: > +1 > > On Thu, Oct 10, 2024 at 5:28 PM Xiao Li wrote: > >> Thank you for working on this! >> >> Xiao >> >> Martin Grund 于2024年10月10日周四 03:01写道: >> >>> >>> Hi Bobby, >>> >>> Awes

[DISCUSS] Migrate or deprecate the Spark Kinesis connector

2024-10-14 Thread Johnson Chen
Hi Spark community, A couple months ago, I raised a PR to upgrade the AWS SDK to v2 for the Spark Kinesis connector: https://github.com/apache/spark/pull/44211. Given that the 4.0 feature freeze is coming, I am following up to check whether we still want to have this change in the upcoming 4.0 re

Re: Inconsistent behavior between SQL and DataFrame API on store assignment policy

2024-10-14 Thread Wenchen Fan
Yea looks like a bug, the SQL and DataFrame APIs should be consistent. Please create a JIRA ticket, thanks! On Mon, Oct 14, 2024 at 3:29 PM Manu Zhang wrote: > Hi community, > > With `spark.sql.storeAssignmentPolicy=LEGACY` in Spark 3.5, it's not > allowed to write to DSv2 with insert SQL. Howev

Support for `sparkContext.broadcast` function in Spark Connect?

2024-10-14 Thread Deependra Patel
Hi, I see that Spark Context methods are not supported in Spark Connect. There are many common use cases eg. broadcast machine learning model weights to all executors, so no need to fetch individually. This makes migration of workloads to Spark Connect tougher. I know there are plans to add more &