Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
> ... but are there examples of how you would actually do any of that with the new API mode? The api mode allows you to switch between Spark Connect architecture and Classic architecture easily for your application, during the migration phase. Your application still compiles with the full Spark de

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Adam Binford
> > Your application is running as an individual process, which is fully > decoupled from the server (Spark driver). You can pick different Java/Scala > versions, Python versions, dependency versions, or even a different > language such as Go and Rust. I understand this part as a general benefit

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
Hi Adam, May bad, I should have provided more context. The official way to use Spark Connect is to deploy a long-lived server to serve many clients (like thriftserver), which needs users to manually set it up, and there is no default. The so-called Spark Connect on-by-default here refers to the ne

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Adam Binford
Long time Spark on YARN user with some maybe dumb questions but I'm guessing other users might be wondering the same things. First, what does "Spark Connect enabled by default" actually even mean? I assume this is referring to the "spark.api.mode" discussion from before, but even in that discussio

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread L. C. Hsieh
+1 for the additional option. Agreed that we should keep on track with the schedule. If as mentioned earlier that there are no critical blockers, it should be fine. On Tue, Feb 4, 2025 at 8:05 PM Denny Lee wrote: > > +1 (non-binding) on this proposal. Just as long as there are no schedule > co

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Jules Damji
+1 (non-binding) A way forward for Apache Spark, allowing developers to choose either option, offering community to share critical feedback for Spark Connect, and paving a path for Spark to be accessible from everywhere, from other non-jvm based languages. Cheers Jules — Sent from my iPhon

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Denny Lee
+1 (non-binding) on this proposal. Just as long as there are no schedule concerns - similar to Mridul and Dongjoon’s call outs, then yes, I think this would be helpful for adoption.Thanks! On Tue, Feb 4, 2025 at 18:43 huaxin gao wrote: > I support publishing an additional Spark distributio

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread huaxin gao
I support publishing an additional Spark distribution with Spark Connect enabled in Spark 4.0 to boost Spark adoption. I also share Dongjoon's concern regarding potential schedule delays. As long as we monitor the timeline closely and thoroughly document any PRs that do not make it into the RC, we

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Mridul Muralidharan
+1 to new distribution mechanisms which will increase Spark adoption ! I do agree with Dongjoon’s concerns that this should not result in slipping the schedule; something to watch out for. Regards, Mridul On Tue, Feb 4, 2025 at 8:07 PM Hyukjin Kwon wrote: > I am fine with providing another o

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Hyukjin Kwon
I am fine with providing another option +1 with leaving others as are. Once the vote passes, we should probably make it ready ASAP - I don't think it will need a lot of changes in any event. On Wed, 5 Feb 2025 at 02:40, DB Tsai wrote: > Many of the remaining PRs relate to Spark ML Connect suppor

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread DB Tsai
Many of the remaining PRs relate to Spark ML Connect support, but they are not critical blockers for offering an additional Spark distribution with Spark Connect enabled by default in Spark 4.0, allowing users to try it out and provide more feedback. I agree that we should not postpone the Spar

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Dongjoon Hyun
Many new feature `Connect` patches are still landing `branch-4.0` during the QA period after February 1st. SPARK-49308 Support UserDefinedAggregateFunction in Spark Connect Scala Client SPARK-50104 Support SparkSession.executeCommand in Connect SPARK-50943 Support `Correlation` on Connect SPARK-50

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Wenchen Fan
Hi Dongjoon, This is a big decision but not a big project. We just need to update the release scripts to produce the additional Spark distribution. If people are positive about this, I can start to implement the script changes now and merge it after this proposal has been voted on and approved. T

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Dongjoon Hyun
Hi, Wenchen. I'm wondering if this implies any delay of the existing QA and RC1 schedule or not. If then, why don't we schedule this new alternative proposal on Spark 4.1 properly? Best regards, Dongjoon On Mon, Feb 3, 2025 at 23:31 Wenchen Fan wrote: > Hi all, > > There is partial agreement

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread DB Tsai
+1 This enables users to easily experiment with and provide feedback on Spark Connect, while also facilitating broader adoption and development in other languages like Rust, Go, or Scala 3. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Feb 3, 2025, at 11:29 PM, Wenchen Fan

[DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-03 Thread Wenchen Fan
Hi all, There is partial agreement and consensus that Spark Connect is crucial for the future stability of Spark APIs for both end users and developers. At the same time, a couple of PMC members raised concerns about making Spark Connect the default in the upcoming Spark 4.0 release. I’m proposing