Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Chao Sun
+1 On Wed, May 14, 2025 at 10:30 AM Holden Karau wrote: > +1 > > On Wed, May 14, 2025 at 10:10 AM L. C. Hsieh wrote: > >> +1 >> >> On Tue, May 13, 2025 at 3:28 PM Wenchen Fan wrote: >> > >> > Please vote on releasing the following candidate as Apache Spark >> version 4.0.0. >> > >> > The vote

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Yuanjian Li
+1 On Wed, May 14, 2025 at 07:27 Rozov, Vlad wrote: > +1 to wait for the fix. > > Thank you, > > Vlad > > > On May 14, 2025, at 5:30 AM, Weichen Xu > wrote: > > Hi folks, > > We have a RCE fix https://github.com/apache/spark/pull/50889 pending > merging, and it needs to be backported to 4.0.0.

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Denny Lee
+1 (non-binding) On Wed, May 14, 2025 at 10:34 AM Chao Sun wrote: > +1 > > On Wed, May 14, 2025 at 10:30 AM Holden Karau > wrote: > >> +1 >> >> On Wed, May 14, 2025 at 10:10 AM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Tue, May 13, 2025 at 3:28 PM Wenchen Fan wrote: >>> > >>> > Please vote on r

Re: [DISCUSS][SPARK SQL] SPARK-51710: Using Dataframe.dropDuplicates with an empty array as argument behaves "unexpectedly"

2025-05-14 Thread Wenchen Fan
So you are basically saying df.dropDuplicates(Seq.empty) should be the same as df.dropDuplicates(all_columns). I think this is a reasonable change, as the previous behavior doesn't make sense which always returns the first row. For safety, we can add a legacy config for fallback and mention it in t

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Yuming Wang
+1. On Wed, May 14, 2025 at 2:42 PM huaxin gao wrote: > +1 Checked Iceberg/RC6; all tests passed. > > On Tue, May 13, 2025 at 11:02 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> +1 (non-binding) >> >> On Wed, May 14, 2025 at 7:29 AM Wenchen Fan wrote: >> >>> Please vote on relea

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread L. C. Hsieh
+1 On Tue, May 13, 2025 at 3:28 PM Wenchen Fan wrote: > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until May 16 (PST) and passes if a majority +1 PMC votes are > cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Holden Karau
+1 On Wed, May 14, 2025 at 10:10 AM L. C. Hsieh wrote: > +1 > > On Tue, May 13, 2025 at 3:28 PM Wenchen Fan wrote: > > > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > > > The vote is open until May 16 (PST) and passes if a majority +1 PMC > votes are ca

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Jules Damji
+1 (non-binding)—Sent from my iPhonePardon the dumb thumb typos :)On May 14, 2025, at 11:29 AM, Denny Lee wrote:+1 (non-binding)On Wed, May 14, 2025 at 10:34 AM Chao Sun wrote:+1On Wed, May 14, 2025 at 10:30 AM Holden Karau wrote:+1On Wed, May 14, 202

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Peter Toth
+1 On Wed, May 14, 2025 at 12:07 PM Cheng Pan wrote: > +1 (non-binding) > > Deployed on a YARN cluster, run some TPC-H queries. > Passed Apache Kyuubi integration test. > > Thanks, > Cheng Pan > > > > On May 14, 2025, at 06:28, Wenchen Fan wrote: > > Please vote on releasing the following candi

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Weichen Xu
Hi folks, We have a RCE fix https://github.com/apache/spark/pull/50889 pending merging, and it needs to be backported to 4.0.0. Shall we wait for it ? Thanks ! On Wed, May 14, 2025 at 7:19 PM Peter Toth wrote: > +1 > > On Wed, May 14, 2025 at 12:07 PM Cheng Pan wrote: > >> +1 (non-binding) >>

Re: [DISCUSS][SPARK SQL] SPARK-51710: Using Dataframe.dropDuplicates with an empty array as argument behaves "unexpectedly"

2025-05-14 Thread David Kunzmann
Hi James, I see how the behavior makes sense now, but I was wondering why a user would do this intentionally instead of using head() or first(). I thought it could mainly be done by mistake, as there is no benefit from using df.dropDuplicates(Seq.empty) . On Fri, May 9, 2025 at 8:50 PM James Will

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Cheng Pan
+1 (non-binding) Deployed on a YARN cluster, run some TPC-H queries. Passed Apache Kyuubi integration test. Thanks, Cheng Pan > On May 14, 2025, at 06:28, Wenchen Fan wrote: > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until May

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Rozov, Vlad
+1 to wait for the fix. Thank you, Vlad On May 14, 2025, at 5:30 AM, Weichen Xu wrote: Hi folks, We have a RCE fix https://github.com/apache/spark/pull/50889 pending merging, and it needs to be backported to 4.0.0. Shall we wait for it ? Thanks ! On Wed, May 14, 2025 at 7:19 PM Peter Toth

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Kent Yao
+1 Jules Damji 于2025年5月15日周四 04:05写道: > +1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On May 14, 2025, at 11:29 AM, Denny Lee wrote: > >  > +1 (non-binding) > > On Wed, May 14, 2025 at 10:34 AM Chao Sun wrote: > >> +1 >> >> On Wed, May 14, 2025 at 10:30 AM Ho