Hi Dongjoon, This is a big decision but not a big project. We just need to update the release scripts to produce the additional Spark distribution. If people are positive about this, I can start to implement the script changes now and merge it after this proposal has been voted on and approved.
Thanks, Wenchen On Tue, Feb 4, 2025 at 4:10 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, Wenchen. > > I'm wondering if this implies any delay of the existing QA and RC1 > schedule or not. > > If then, why don't we schedule this new alternative proposal on Spark 4.1 > properly? > > Best regards, > Dongjoon > > On Mon, Feb 3, 2025 at 23:31 Wenchen Fan <cloud0...@gmail.com> wrote: > >> Hi all, >> >> There is partial agreement and consensus that Spark Connect is crucial >> for the future stability of Spark APIs for both end users and developers. >> At the same time, a couple of PMC members raised concerns about making >> Spark Connect the default in the upcoming Spark 4.0 release. I’m proposing >> an alternative approach here: publish an additional Spark distribution with >> Spark Connect enabled by default. This approach will help promote the >> adoption of Spark Connect among new users while allowing us to gather >> valuable feedback. A separate distribution with Spark Connect enabled by >> default can promote future adoption of Spark Connect for languages like >> Rust, Go, or Scala 3. >> >> Here are the details of the proposal: >> >> - Spark 4.0 will include three PyPI packages: >> - pyspark: The classic package. >> - pyspark-client: The thin Spark Connect Python client. Note, in >> the Spark 4.0 preview releases, we have published the pyspark-connect >> package for the thin client, we will need to rename it in the official >> 4.0 >> release. >> - pyspark-connect: Spark Connect enabled by default. >> - An additional tarball will be added to the Spark 4.0 download page >> with updated scripts (spark-submit, spark-shell, etc.) to enable Spark >> Connect by default. >> - A new Docker image will be provided with Spark Connect enabled by >> default. >> >> By taking this approach, we can make Spark Connect more visible and >> accessible to users, which is more effective than simply asking them to >> configure it manually. >> >> Looking forward to hearing your thoughts! >> >> Thanks, >> Wenchen >> >