ASF Board report draft for May 2025

2025-05-09 Thread Matei Zaharia
Our board report is due on May 14th. Here’s a draft of what’s been happening, feel free to provide comments: == Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as wel

[VOTE] Release Spark 4.0.0 (RC5)

2025-05-09 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: [DISCUSS][SPARK SQL] SPARK-51710: Using Dataframe.dropDuplicates with an empty array as argument behaves "unexpectedly"

2025-05-09 Thread James Willis
This seems like the correct behavior to me. Every value of the null set of columns will match between any pair of Rows. On Thu, May 8, 2025 at 11:37 AM David Kunzmann wrote: > Hello everyone, > > Following the creation of this PR > and the discussio

[DISCUSS] New Spark Connect Client repository for Rust language

2025-05-09 Thread Renjie Liu
Hi, All: I'd like to propose to add a new Apache Spark repository for `Spark Connect Client for Rust`. https://github.com/apache/spark-connect- rust There are already some efforts for building spark-connect client in rust: https://github.com/sjrusso

Help requested: Spark security triage and followup

2025-05-09 Thread Apache Security Team
Dear Spark users and developers, As you know, the Apache Software Foundation takes our users' security seriously, and defines sensible release and security processes to make sure potential security issues are dealt with responsibly. These indirectly also protect our committers, shielding individua