+1 

This enables users to easily experiment with and provide feedback on Spark 
Connect, while also facilitating broader adoption and development in other 
languages like Rust, Go, or Scala 3.

DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1

> On Feb 3, 2025, at 11:29 PM, Wenchen Fan <cloud0...@gmail.com> wrote:
> 
> Hi all,
> 
> There is partial agreement and consensus that Spark Connect is crucial for 
> the future stability of Spark APIs for both end users and developers. At the 
> same time, a couple of PMC members raised concerns about making Spark Connect 
> the default in the upcoming Spark 4.0 release. I’m proposing an alternative 
> approach here: publish an additional Spark distribution with Spark Connect 
> enabled by default. This approach will help promote the adoption of Spark 
> Connect among new users while allowing us to gather valuable feedback. A 
> separate distribution with Spark Connect enabled by default can promote 
> future adoption of Spark Connect for languages like Rust, Go, or Scala 3.
> 
> Here are the details of the proposal:
> 
> Spark 4.0 will include three PyPI packages:
> pyspark: The classic package.
> pyspark-client: The thin Spark Connect Python client. Note, in the Spark 4.0 
> preview releases, we have published the pyspark-connect package for the thin 
> client, we will need to rename it in the official 4.0 release.
> pyspark-connect: Spark Connect enabled by default.
> An additional tarball will be added to the Spark 4.0 download page with 
> updated scripts (spark-submit, spark-shell, etc.) to enable Spark Connect by 
> default.
> A new Docker image will be provided with Spark Connect enabled by default.
> By taking this approach, we can make Spark Connect more visible and 
> accessible to users, which is more effective than simply asking them to 
> configure it manually.
> 
> Looking forward to hearing your thoughts!
> 
> Thanks,
> Wenchen
> 

Reply via email to