Re: Support for `sparkContext.broadcast` function in Spark Connect?

2024-10-14 Thread Martin Grund
>From my rough high-level view, there is nothing stopping us from adding
broadcast variables to Spark Connect; we essentially have to lift them to
the Spark Session. This would not be different from what we're doing for
artifact management or what we've done for job cancellation.

If you're interested in working on this, I'm happy to guide a bit.

Martin



On Mon, Oct 14, 2024 at 7:08 PM Deependra Patel 
wrote:

> Hi,
> I see that Spark Context methods are not supported in Spark Connect. There
> are many common use cases eg. broadcast machine learning model weights to
> all executors, so no need to fetch individually.
>
> This makes migration of workloads to Spark Connect tougher. I know there
> are plans to add more & more functionalities to Spark Connect eg.
> SparkMLlib.
>
> My question is: Is there an ETA to support broadcasts in Spark Connect API
> too? Or it won't be supported due to how Spark Connect is designed,
> maybe separate JVMs/security etc? I can also consider implementing this
> based on the above answers and effort.
>
> Regards,
> Deependra
>


Re: [DISCUSS] Support spark.ml on Spark Connect

2024-10-14 Thread Bobby
Thank you for your kind response. I will prepare a formal PR for Spark.



Niranjan Jayakar  于2024年10月11日周五 22:45写道:

> +1
>
> On Thu, Oct 10, 2024 at 5:28 PM Xiao Li  wrote:
>
>> Thank you for working on this!
>>
>> Xiao
>>
>> Martin Grund  于2024年10月10日周四 03:01写道:
>>
>>>
>>> Hi Bobby,
>>>
>>> Awesome to see the proposal! I'm very much looking forward to the
>>> contributions!
>>>
>>> Martin
>>>
>>> On Thu, Oct 10, 2024 at 4:16 AM Ángel 
>>> wrote:
>>>
 You have my vote  (btw, great idea, ML is so sexy nowadays 😉)

 El jue, 10 oct 2024 a las 3:19, Bobby () escribió:

> Hi,
>
> I'd like to start a discussion about support spark.ml on Connect.
> With this feature, Users don't need to change their code to run Spark ML
> cases on Connect.
>
> Please refer to the JIRA:
> https://issues.apache.org/jira/browse/SPARK-49907
>
> Thx,
> Bobby Wang
>



[DISCUSS] Migrate or deprecate the Spark Kinesis connector

2024-10-14 Thread Johnson Chen
Hi Spark community,

A couple months ago, I raised a PR to upgrade the AWS SDK to v2 for the
Spark Kinesis connector: https://github.com/apache/spark/pull/44211.  Given
that the 4.0 feature freeze is coming, I am following up to check whether
we still want to have this change in the upcoming 4.0 release or not? If
yes, I could revise and rebase the PR accordingly.

Here is the tracking Jira: https://issues.apache.org/jira/browse/SPARK-45720


-- 
Thanks,
Junyu


Re: Inconsistent behavior between SQL and DataFrame API on store assignment policy

2024-10-14 Thread Wenchen Fan
Yea looks like a bug, the SQL and DataFrame APIs should be consistent.
Please create a JIRA ticket, thanks!

On Mon, Oct 14, 2024 at 3:29 PM Manu Zhang  wrote:

> Hi community,
>
> With `spark.sql.storeAssignmentPolicy=LEGACY` in Spark 3.5, it's not
> allowed to write to DSv2 with insert SQL. However, this can be worked
> around with DataFrame API, i.e., `df.writeTo($dsv2Table).append()`
>
> Is this expected?
>
> Thanks,
> Manu
>
>


Support for `sparkContext.broadcast` function in Spark Connect?

2024-10-14 Thread Deependra Patel
Hi,
I see that Spark Context methods are not supported in Spark Connect. There
are many common use cases eg. broadcast machine learning model weights to
all executors, so no need to fetch individually.

This makes migration of workloads to Spark Connect tougher. I know there
are plans to add more & more functionalities to Spark Connect eg.
SparkMLlib.

My question is: Is there an ETA to support broadcasts in Spark Connect API
too? Or it won't be supported due to how Spark Connect is designed,
maybe separate JVMs/security etc? I can also consider implementing this
based on the above answers and effort.

Regards,
Deependra