Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
一 23:00写道: >>> >>>> +1 (non-binding) >>>> >>>> On Mon, Jun 2, 2025 at 7:34 AM Chao Sun wrote: >>>> >>>>> +1 >>>>> >>>>> On Mon, Jun 2, 2025 at 7:31 AM Jungtaek Lim < >>>>> kabhwan

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 8:55 PM Peter Toth wrote: > +1 > > On Mon, Jun 2, 2025 at 2:33 PM xianjin wrote: > >> +1. >> Sent from my iPhone >> >> On Jun 2, 2025, at 12:50 PM, DB Tsai wrote: >> >> +1 looking forward to seeing real-time mode. >> Sent from my iPhone >> >> On Jun 1, 2025, at 9:47 

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-01 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 9:55 AM Yuanjian Li wrote: > +1 > > On Sun, Jun 1, 2025 at 18:30 DB Tsai wrote: > >> +1 >> >> Sent from my iPhone >> >> > On Jun 1, 2025, at 2:32 AM, Yang Jie wrote: >> > >> > +1 >> > >> >> On 2025/06/01 08:09:38 Peter Toth wrote: >> >> +1 >> >> >> >>> On Sun, Jun 1,

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-01 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 9:30 AM DB Tsai wrote: > +1 > > Sent from my iPhone > > > On Jun 1, 2025, at 2:32 AM, Yang Jie wrote: > > > > +1 > > > >> On 2025/06/01 08:10:10 Peter Toth wrote: > >> +1 > >> > >>> On Sun, Jun 1, 2025 at 9:01 AM L. C. Hsieh wrote: > >>> > >>> +1 > >>> > >>> On Sat,

Re: [DISCUSS][MINOR] Fix broken link in spark-website for SS Programming Guide

2025-05-30 Thread Wenchen Fan
+1 to fix this issue immediately. On Fri, May 30, 2025 at 3:16 PM Jerry Peng wrote: > +1 for fixing this immediately. > > Anish, thanks for pointing this issue out! > > On Fri, May 30, 2025 at 12:12 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> I’m +1 to fix this in website for 4

[ANNOUNCE] Apache Spark 4.0.0 released

2025-05-28 Thread Wenchen Fan
Hi All, We are happy to announce the availability of *Apache Spark 4.0.0*! Apache Spark 4.0.0 is the first release of the 4.x line. This release resolves more than 5100 tickets with contributions from more than 390 individuals. To download Spark 4.0.0, head over to the download page: https://spa

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-27 Thread Wenchen Fan
The current policy is that we only backport bug fixes and security fixes. If the CVEs and bug fixes from Parquet 1.13.1 to 1.15.2 affect Spark, we can backport the Parquet version upgrade. On Wed, May 28, 2025 at 10:47 AM Rozov, Vlad wrote: > I’ll go with the community vote. > > My take: > > - t

[REVIEW NEEDED] Spark 4.0.0 release notes

2025-05-26 Thread Wenchen Fan
Hi all, I'm trying my best to mention all the major changes in the Spark 4.0 release notes: https://github.com/apache/spark-website/pull/608 . This is a huge release, and we encourage you to review the changes and help us identify any missed items. Thanks, Wenchen

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-26 Thread Wenchen Fan
+1. When this release is out, let's also update the release process document to introduce the new way of making releases with GitHub Action jobs. On Tue, May 27, 2025 at 6:22 AM Dongjoon Hyun wrote: > +1 from my side. > > Thank you, Hyukjin. > > Dongjoon > > On 2025/05/26 22:19:22 Hyukjin Kwon w

Re: [VOTE][RESULT] Release Spark 4.0.0 (RC7)

2025-05-23 Thread Wenchen Fan
Hi all, I've created a draft for the Spark 4.0.0 release notes. Please take a look, and you can leave PR comments directly: https://github.com/apache/spark-website/pull/608 Thanks, Wenchen On Thu, May 22, 2025 at 9:38 PM Wenchen Fan wrote: > I'll finalize the release (publis

Re: [VOTE][RESULT] Release Spark 4.0.0 (RC7)

2025-05-22 Thread Wenchen Fan
I'll finalize the release (publishing, release notes, etc.) in the next few days. Big thanks to everyone who contributed to this huge release! On Thu, May 22, 2025 at 9:35 PM Wenchen Fan wrote: > The vote passes with 19 +1s (11 binding +1s) and no -1s. > Thanks to all who helped wi

[VOTE][RESULT] Release Spark 4.0.0 (RC7)

2025-05-22 Thread Wenchen Fan
The vote passes with 19 +1s (11 binding +1s) and no -1s. Thanks to all who helped with the vote! (* = binding) +1: Wenchen Fan (*) Hyukjin Kwon (*) Ruifeng Zheng (*) Jungtaek Lim Gengliang Wang (*) Dongjoon Hyun (*) Jules Damji Rozov, Vlad Denny Lee Szehon Ho Yuming Wang (*) Cheng Pan Yang Jie

Re: [VOTE] New Spark Connect Client Repository for Rust

2025-05-20 Thread Wenchen Fan
+1 On Tue, May 20, 2025 at 2:34 PM Jungtaek Lim wrote: > Sounds great, thanks for the clarification. > > +1 (non-binding) > > On Tue, May 20, 2025 at 3:20 PM Renjie Liu > wrote: > >> Hi, Jungtaek, Yuanjian: >> >> Yes, I'm more than happy to contribute to this project! >> >> On Tue, May 20, 2025

Re: [DISCUSS] Apache Spark 3.5.6 (automated release)

2025-05-19 Thread Wenchen Fan
+1, look forward to Spark's first automated release! On Tue, May 20, 2025 at 12:55 PM Yuming Wang wrote: > +1. > > On Tue, May 20, 2025 at 12:51 PM Hyukjin Kwon > wrote: > >> Hi, >> >> I am testing automatic release via Github Actions ( >> https://github.com/apache/spark/pull/50911), and it's a

Re: [VOTE] Release Spark 4.0.0 (RC7)

2025-05-19 Thread Wenchen Fan
Same as before, I'll start with my own +1. On Mon, May 19, 2025 at 8:25 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until May 22 (PST) and passes if a majority +1 PMC votes > are cast, wi

[VOTE] Release Spark 4.0.0 (RC7)

2025-05-19 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 22 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: [VOTE] Release Apache Spark Connect Swift Client 0.2.0 (RC1)

2025-05-18 Thread Wenchen Fan
+1 On Mon, May 19, 2025 at 4:48 AM Jules Damji wrote: > + 1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On May 17, 2025, at 4:32 PM, Zhou Jiang wrote: > > +1 (non-binding) > > On May 17, 2025, at 1

Re: [VOTE] Release Apache Spark K8s Operator 0.2.0 (RC1)

2025-05-18 Thread Wenchen Fan
+1 On Mon, May 19, 2025 at 7:07 AM Liu Cao wrote: > +1 (non-binding) > > On Mon, May 19, 2025 at 03:07 Rozov, Vlad > wrote: > >> +1 (non-binding) >> >> Vlad >> >> On May 18, 2025, at 7:02 AM, Peter Toth wrote: >> >> +1 >> >> On Sun, May 18, 2025 at 1:29 AM Hyukjin Kwon >> wrote: >> >>> +1 >>>

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-17 Thread Wenchen Fan
Thanks for > reporting the issue! > > On Sat, May 17, 2025 at 3:05 AM Anish Shrigondekar > wrote: > >> Hi Adam, >> >> Thanks for reporting the issue. @Eric Marnadi >> has a PR for the fix here - >> https://github.com/apache/spark/pull/50924 >> >> @We

Re: [DISCUSS][SPARK SQL] SPARK-51710: Using Dataframe.dropDuplicates with an empty array as argument behaves "unexpectedly"

2025-05-14 Thread Wenchen Fan
So you are basically saying df.dropDuplicates(Seq.empty) should be the same as df.dropDuplicates(all_columns). I think this is a reasonable change, as the previous behavior doesn't make sense which always returns the first row. For safety, we can add a legacy config for fallback and mention it in t

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-13 Thread Wenchen Fan
Same as before, I'll start with my own +1. On Wed, May 14, 2025 at 12:28 AM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until May 16 (PST) and passes if a majority +1 PMC votes > are cast, wi

[VOTE] Release Spark 4.0.0 (RC6)

2025-05-13 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 16 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: Re: [VOTE] Release Spark 4.0.0 (RC5)

2025-05-13 Thread Wenchen Fan
>>>>>>> > +1 (non-binding) >>>>>>> > >>>>>>> > Thank you, >>>>>>> > >>>>>>> > Vlad >>>>>>> > >>>>>>> > On May 12, 2025, at 5:44 PM, huaxin g

Re: [VOTE] Release Spark 4.0.0 (RC5)

2025-05-12 Thread Wenchen Fan
I'll start with my own +1. All the known blockers are fixed, and I verified that the new Spark Connect distribution works as expected. On Fri, May 9, 2025 at 8:16 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > >

[VOTE] Release Spark 4.0.0 (RC5)

2025-05-09 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Wenchen Fan
Li wrote: >>> >>>> +1 >>>> >>>> On Mon, May 5, 2025 at 18:35 Yuming Wang wrote: >>>> >>>>> +1 >>>>> >>>>> On Tue, May 6, 2025 at 9:12 AM Denny Lee >>>>> wrote: >>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Wenchen Fan
+1 On Tue, May 6, 2025 at 3:40 AM Reynold Xin wrote: > +1 > > > On Mon, May 5, 2025 at 12:37 PM Bjørn Jørgensen > wrote: > >> +1 >> >> man. 5. mai 2025 kl. 21:28 skrev Milan Stefanovic < >> stefanovic.mila...@gmail.com>: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Milan >>> >>> On Mon, 5 May 20

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Wenchen Fan
+1, thanks! On Tue, May 6, 2025 at 2:21 AM DB Tsai wrote: > +1 > > On May 5, 2025, at 1:10 AM, Gabor Somogyi > wrote: > >  > +1 (non-binding) > > G > > > On Mon, May 5, 2025 at 8:59 AM huaxin gao wrote: > >> +1 Thanks Dongjoon. >> >> On Sun, May 4, 2025 at 7:36 PM Rozov, Vlad >> wrote: >> >>

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Wenchen Fan
y.SparkSubmit.main(SparkSubmit.scala) > > We should change the wording in the migration guide and fix it (at least > report it as an issue). I can do both if allowed :wink: > > Pozdrawiam, > Jacek Laskowski > ---- > "The Internals Of" Online Books <https://book

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Wenchen Fan
Hi Jacek, Thanks for reporting the issue! Did you hit the same problem when you set the `spark.jars.ivy` config with Spark 3.5? If this config never worked with a relative path, we should change the wording in the migration guide. Thanks, Wenchen On Sun, Apr 27, 2025 at 10:27 PM Jacek Laskowski

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Wenchen Fan
t; >> Vlad >> >> On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote: >> >> The deadlock is introduced by PARQUET-2432(1.14.0), if we decide >> downgrade, the latest workable version is Parquet 1.13.1. >> >> Thanks, >> Cheng Pan >> >> >> >&g

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Wenchen Fan
gt;> >>> Once merged, this should be included in the next RC. >>> >>> On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote: >>> >>>> Please vote on releasing the following candidate as Apache Spark >>>> version 4.0.0. >>>> >&

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Wenchen Fan
a java.lang.Object) > > at java.lang.Thread.interrupt(java.base@17.0.6/Thread.java:997) > > - locked <0x7f9575003ab0> (a java.lang.Object) > > at > org.apache.spark.util.UninterruptibleThread.interrupt(UninterruptibleThread.scala:99) > > - locked <0x7f9

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Wenchen Fan
@Hyukjin Kwon is looking into it now and will share the findings later. Please let me know if you find any other issues with RC4, either functionality issues with Spark itself, or integration issues with downstream libraries. Thanks! Wenchen On Thu, Apr 10, 2025 at 11:21 PM Wenchen Fan wrote:

[VOTE] Release Spark 4.0.0 (RC4)

2025-04-10 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until April 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To

Re: [VOTE] SPIP: Declarative Pipelines

2025-04-09 Thread Wenchen Fan
+1 (binding) On Thu, Apr 10, 2025 at 7:30 AM Szehon Ho wrote: > +1 (non-binding) > > Thanks > Szehon > > On Wed, Apr 9, 2025 at 3:42 PM Hyukjin Kwon wrote: > >> I will shephard. >> >> On Thu, 10 Apr 2025 at 07:28, Anton Okolnychyi >> wrote: >> >>> +1 (non-binding) >>> >>> - Anton >>> >>> ср, 9

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
+1 As Gengliang explained, the API allows the connectors to request Spark to perform data validations, but connectors can also choose to do validation by themselves. I think it's a reasonable design as not all connectors have the ability to do data validation by themselves, such as file formats th

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
gt; external system is something entirely different. Should data validation be >>> coupled with the external system? I don't think so. But since I'm the only >>> one arguing against this proposal, does that mean I'm wrong? >>> >>> El mié, 26 mar 2

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-31 Thread Wenchen Fan
Hi all, Thanks for your feedback! Regarding https://github.com/apache/spark/pull/50187 , I don't think it's a 4.0 blocker as it's a CI issue for the examples module. Other than that, all other issues have been resolved and I'll cut the next RC after https://github.com/apache/spark/pull/50197 is me

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-31 Thread Wenchen Fan
Hi Jia, Thanks for your detailed explanation! The existing implementation of geospatial serialization, predicate pushdown, and other features in Apache Sedona is indeed valuable for this project. What we’re proposing isn’t something entirely new to the industry but rather a re-architecture: we bel

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Wenchen Fan
Hi Jia, This is a good question. As the shepherd of this SPIP, I'd like to clarify the motivation here: the focus of this project is more about the storage part, not the processing. Apache Sedona is a great library for geo processing, but without native geo type support in Spark, users can't do th

Re: [VOTE] SPIP: Support NanoSecond Timestamps

2025-03-27 Thread Wenchen Fan
Maybe we should discuss the key issues on the dev list as it's easy to lose track of Google Doc comments. I think all the proposals for adding new data types need to prove that the new data type is common/standard in the ecosystem. This means 3 things: - it has common/standard semantic. TIMESTAMP

Re: Requesting advice, thought

2025-03-27 Thread Wenchen Fan
The file source in Spark has not been migrated to DS v2 yet and uses dedicated catalyst rules to do runtime filtering, e.g. PartitionPruning and PlanDynamicPruningFilters On Thu, Mar 27, 2025 at 6:53 PM Asif Shahid wrote: > Hi Experts, > Could you please allow me to pick your brain on the follo

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Wenchen Fan
A slightly off-topic but related question: It feels fragile to test with SBT while publishing the release with Maven. How did we end up in this situation? Moreover, since most Spark developers use SBT for their daily work, it becomes even harder to catch issues with the Maven build. On Thu, Mar 27

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-25 Thread Wenchen Fan
I agree, 4.0 is already in the RC stage and I think it's too late to do such a big version bump for the Hive dependency. We definitely need to do this upgrade and thanks for working on it! On Mon, Mar 24, 2025 at 1:31 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > That's grea

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Wenchen Fan
gt;>>>>>>> I will also take a look and see if we can reject jars >>>>>>>>>> automatically in PRs or CI. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Wenchen Fan
It seems there’s no quick fix for this issue. Should we remove these jars and disable the tests for now to comply with ASF policy? While this would temporarily reduce test coverage until we refactor the tests to avoid pre-compiled jars, we can encourage Spark vendors not to cherry-pick this test-di

[VOTE] Release Spark 4.0.0 (RC3)

2025-03-20 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until March 25 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Wenchen Fan
>> +1: >> - Mark Hamstra * >> - Jungtaek Lim >> - Wenchen Fan * >> - Reynold Xin * >> - Yuanjian Li * >> >> -1: >> - Holden Karau * >> - Hyukjin Kwon * >> - Dongjoon Hyun * >> >> Thanks. >> >>

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-17 Thread Wenchen Fan
Before I cast my vote here, I'd like to highlight one thing: As the release manager of Apache Spark 4.0.0, I was not notified about the breaking change of renaming an already-released configuration: https://github.com/apache/spark/pull/49897 . Note that the previous VOTE from Dongjoon was about Apa

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
36 AM Wenchen Fan wrote: > I've created the revert PR for branch-4.0: > https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy > consensus but it's clear that this breaking change PR has failed to achieve > consensus. > > I hope we now have a clear founda

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
I've created the revert PR for branch-4.0: https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy consensus but it's clear that this breaking change PR has failed to achieve consensus. I hope we now have a clear foundation for discussing solutions. As it stands, the misnamed confi

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
y. On Mon, Mar 17, 2025 at 8:49 AM Wenchen Fan wrote: > Before I cast my vote here, I'd like to highlight one thing: As the > release manager of Apache Spark 4.0.0, I was not notified about the > breaking change of renaming an already-released configuration: > https://github.com/apac

Re: Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-15 Thread Wenchen Fan
In general, I think it's good for Spark to support the common data types in the ecosystem, as it's the only way to fully integrate with the ecosystem. So +1. On Fri, Mar 14, 2025 at 8:56 AM 谭琦 wrote: > Updated. Thanks. > > On 2025/03/13 23:56:20 Jungtaek Lim wrote: > > Hi, would you mind allowin

Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-13 Thread Wenchen Fan
f this. > > > > (* = binding) > > +1: > > - Sean R. Owen * > > - Jungtaek Lim > > - Nicholas Chammas > > - Wenchen Fan * > > - Adam Binford > > - Russell Jurney > > - Yang Jie * > > > > -1: > > - Dongjoon Hyun * > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >

Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-13 Thread Wenchen Fan
Will this nanosecond timestamp be a fixed-size (10 bytes) binary in UnsafeRow and ColumnVector? On Thu, Mar 13, 2025 at 4:57 PM Jungtaek Lim wrote: > Hi, would you mind allowing comments on the doc? Thanks! > > On Fri, Mar 14, 2025 at 8:50 AM Qi Tan wrote: > >> Hello everybody, >> >> I would li

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-11 Thread Wenchen Fan
+1 On Mon, Mar 10, 2025 at 8:02 AM L. C. Hsieh wrote: > +1 > > Thanks Dongjoon for contributing to Swift implementation. > > On Mon, Mar 10, 2025 at 7:18 AM Hyukjin Kwon wrote: > > > > +1 > > On Mon, Mar 10, 2025 at 6:48 AM Yang Jie wrote: > >> > >> Great! Really happy to see that spark-connec

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Wenchen Fan
Guys, let’s be honest about what we’re discussing here. If this is a migration issue, why would we even need a vote? We’ve been consistently adding configurations to restore legacy behavior instead of removing them because we understand the challenges of upgrading Spark versions. Our goal has alwa

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-06 Thread Wenchen Fan
ryNum=5, >> policy=DefaultPolicy). >> >> ^C25/03/04 14:49:40 INFO ShutdownHookManager: Shutdown hook called >> >> >> >> On Mar 4, 2025, at 2:24 PM, Chris Nauroth wrote: >> >> -1 (non-binding) >> >> I think I found some missing li

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Wenchen Fan
Shall we open an official vote for it? We can put more details on it so that people can vote: 1. how does it break user workloads without this migration code? 2. what is the Apache policy for leaked vendor names in the codebase? I think this is not the only one, we also mentioned `com.databricks.sp

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-03 Thread Wenchen Fan
-1 > > https://github.com/apache/spark/pull/50112 should be merged before > release. > > > > At 2025-03-01 15:25:06, "Wenchen Fan" wrote: > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until Ma

[VOTE] Release Spark 4.0.0 (RC2)

2025-02-28 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until March 5 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To l

Re: [VOTE] Release Spark 4.0.0 (RC1)

2025-02-23 Thread Wenchen Fan
ckers. >>>>> >>>>> I know RC1 is mostly for everyone to audit the release artifacts >>>>> earlier before everything is ready, but just wanted to make clear that we >>>>> expect to have another RC. >>>>> >>>>> On

Re: [VOTE] SPIP: Add the TIME data type

2025-02-23 Thread Wenchen Fan
+1 On Mon, Feb 24, 2025 at 7:51 AM John Zhuge wrote: > +1 (non-binding) > > John Zhuge > > > On Sun, Feb 23, 2025 at 2:37 PM huaxin gao wrote: > >> +1 >> >> On Sun, Feb 23, 2025 at 1:51 PM serge rielau.com >> wrote: >> >>> +1 it’s abt time. >>> Sent from my iPhone >>> >>> > On Feb 23, 2025, a

Re: [VOTE] Release Spark 3.5.5 (RC1)

2025-02-23 Thread Wenchen Fan
-0 as I just found a long-standing correctness bug: https://github.com/apache/spark/pull/50040 It's not a regression in 3.5 so technically it's not a release blocker, but it's better to include it as we are just about to release 3.5.5. On Mon, Feb 24, 2025 at 9:11 AM Mich Talebzadeh wrote: > +1

[VOTE] Release Spark 4.0.0 (RC1)

2025-02-19 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until February 21 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Wenchen Fan
+1 On Wed, Feb 19, 2025 at 2:36 PM Sakthi wrote: > +1 (non-binding) > > On Tue, Feb 18, 2025 at 10:21 PM Yang Jie wrote: > >> +1 >> >> On 2025/02/19 05:57:53 Mark Hamstra wrote: >> > +1 >> > >> > On Tue, Feb 18, 2025 at 9:46 PM dongjoon.hyun >> wrote: >> > > >> > > Please vote to deprecate `sp

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
g > should prevent the issue from recurring. This is very much about the > one-time exception, not the rule. > > On Tue, Feb 18, 2025 at 9:30 PM Wenchen Fan wrote: > > > > Hi Dongjoon, > > > > If this is a policy issue that necessitates a breaking change, then &g

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
lear, I'd like to emphasize the followings: > - It's not about just a few lines of code. It's more like a policy-side > issue which we are supposed to keep in Apache Spark repository so far and > in the future. > - All exposed configurations should be considered as used

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
g Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> Pronouns: she/her >>>> >>>> >>>> On Tue, Feb 18, 202

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Wenchen Fan
It’s unfortunate that we missed identifying these issues during the code review. However, since they have already been released, I believe deprecating them is a better approach than removing them, as the latter would introduce a breaking change. Regarding Jungtaek’s PR

4.0.0 RC1 is coming

2025-02-16 Thread Wenchen Fan
Hi all, RC1 was scheduled for Feb 15, but I'll cut in on Feb 18 to have 3 working days during the vote period, due to Feb 15 and 16 being the weekend, and Feb 17 being a holiday in the US. The RC1 vote likely won't pass because of some ongoing work but I think it's better to kick off the release

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-13 Thread Wenchen Fan
ermine with static data , if output will >>> be deterministic ?. Expressions already have deterministic flag. So when an >>> attribute is created from alias, it will be possible to know if attribute >>> is pointing to an inDeterminate component. >>> >>> O

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Wenchen Fan
Interesting, so this is PySpark on pandas which is the reverse of Koalas. If performance is the only problem, maybe we can improve local-mode Spark performance to be on par with these single-node engines. + @Hyukjin Kwon On Mon, Feb 10, 2025 at 8:40 PM José Müller wrote: > Hi Mitch, > > All yo

[VOTE][RESULT] Publish additional Spark distribution with Spark Connect enabled

2025-02-07 Thread Wenchen Fan
Hi all, The vote for "Publish additional Spark distribution with Spark Connect enabled" passes with 22 +1s (13 binding +1s) (* = binding) +1: - Mridul Muralidharan * - Hyukjin Kwon * - Jungtaek Lim - Xiao Li * - DB Tsai * - Sakthi - Gengliang Wang * - L. C. Hsieh * - Yang Jie * - Max Gekk * - Yum

Re: Docs look weird; can't build locally

2025-02-06 Thread Wenchen Fan
Hi Nicholas, I don't know how to set JVM memory for this bundle command, but this PR might help reduce memory usage and avoid the OOM issue. On Fri, Feb 7, 2025 at 12:07 AM Nicholas Chammas wrote: > Gengliang has the CSS/display issue fixed >

Re: ASF board report draft for February 2025

2025-02-05 Thread Wenchen Fan
Hi Angel, AFAIK many people rely on the Spark UI to debug/inspect their queries with the query pan tree and metrics, but you are right that plan string generation is expensive, and we shouldn't do it for every AQE plan change. Maybe we should do it only once to report the final plan for AQE? Let's

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
5, 2025 at 8:42 AM Martin Grund >>> wrote: >>> >>>> +1 >>>> >>>> On Wed, Feb 5, 2025 at 17:15 bo yang wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> On Wed, Feb 5, 2025 at 7:51 AM Jule

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
ld easily launch a remote session on another user's > connect server if it's just automatically launched with no authentication > enabled. > > On Wed, Feb 5, 2025 at 8:54 AM Wenchen Fan wrote: > >> Hi Adam, >> >> May bad, I should have provided more conte

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
n Connect >> >>>>> SPARK-50899 Support PrefixSpan on connect >> >>>>> SPARK-51060 Support `QuantileDiscretizer` on Connect >> >>>>> SPARK-50974 Add support foldCol for CrossValidator on connect >> >>>>> SPARK-5101

[VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Wenchen Fan
xt 72 hours: [ ] +1: Accept the proposal [ ] +0 [ ]- 1: I don’t think this is a good idea because … Best, Wenchen Fan

Re: [DISCUSS] Spark - How to improve our release processes

2025-02-04 Thread Wenchen Fan
w. On Tue, Feb 4, 2025 at 6:32 PM Nimrod Ofek wrote: > Hi all, > > I am trying to revive this thread - to work towards a better release > process, and making sure we have no conflicts in the used artifacts like > nicholas.cham...@gmail.com mentioned. > @Wenchen Fan - can you pleas

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Wenchen Fan
y? > > Best regards, > Dongjoon > > On Mon, Feb 3, 2025 at 23:31 Wenchen Fan wrote: > >> Hi all, >> >> There is partial agreement and consensus that Spark Connect is crucial >> for the future stability of Spark APIs for both end users and developers. >>

[DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-03 Thread Wenchen Fan
Hi all, There is partial agreement and consensus that Spark Connect is crucial for the future stability of Spark APIs for both end users and developers. At the same time, a couple of PMC members raised concerns about making Spark Connect the default in the upcoming Spark 4.0 release. I’m proposing

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Wenchen Fan
It looks like a hard problem to statically analyze the query plan and decide whether a Spark stage is deterministic or not. When I added RDD DeterministicLevel, I thought it was not a hard problem for the callers to specify it, but seems I was wrong. Maybe we should do it at runtime: if Spark retr

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-26 Thread Wenchen Fan
k/pull/49518 > https://github.com/apache/spark/pull/49571 > > 2025/01/15 13:41:07 Wenchen Fan wrote: > > Hi all, > > > > We have cut the "branch-4.0" and I'm sending this email to collect the > > information for ongoing projects targeting Spark 4.0. Pl

Re: A documentation change is a user-facing change

2025-01-16 Thread Wenchen Fan
+1 to update the PR template. I think the intent is to ask PR authors to call out all the user-facing changes that need attention from the end users, such as new features and behavior changes, but doc change is clearly not one of them. On Fri, Jan 17, 2025 at 7:10 AM Gengliang Wang wrote: > Than

[DISCUSS] Ongoing projects for Spark 4.0

2025-01-15 Thread Wenchen Fan
Hi all, We have cut the "branch-4.0" and I'm sending this email to collect the information for ongoing projects targeting Spark 4.0. Please reply to this email to share the project progress with the community. Note that, the scheduled code freeze date is Feb 1, and RC1 cut date is Feb 15. Thanks

Re: [REMINDER] 4.0 branch cut is coming

2025-01-15 Thread Wenchen Fan
"branch-4.0" has been cut. Please merge commits to this branch as well if you think it's necessary before Feb 1. On Wed, Jan 15, 2025 at 4:37 PM Wenchen Fan wrote: > Yes, I'll start a new thread this week to collect the ongoing projects and > track the progress. We ca

Re: [REMINDER] 4.0 branch cut is coming

2025-01-15 Thread Wenchen Fan
national holiday of both China and South Korea. Shall we collect and > check the plan of ongoing projects to see whether they can make it in time > in advance, so that we could determine the realistic date of the QA period > rather than an arbitrary one? > > On Wed, Jan 15, 2025 at 5

Re: [REMINDER] 4.0 branch cut is coming

2025-01-15 Thread Wenchen Fan
s it still bother you for some reason? > Are you going to delete `branch-4.0` created by me? > > Sincerely, > Dongjoon. > > > > On Tue, Jan 14, 2025 at 10:55 PM Wenchen Fan wrote: > >> I don't have any specific PRs in mind, as I review PRs almost daily: PR

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
ell what is going on there. > Which PRs are those? > > > I'm still reviewing and may merge PRs during the daytime. > > Let's see what your real concern is and let us help you. > > I really want to understand how those PRs are blocked or interrupted by > `branch cut`. &g

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
formed. > ``` > > BTW, after creating the `branch-4.0`, the following PR will be merged > in order to differentiate `master` from `branch-4.0`. > > [SPARK-50822][BUILD] Setting version to 4.1.0-SNAPSHOT > https://github.com/apache/spark/pull/49495 > > Best regards, >

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
relatively more commits are coming in daytime in PST. >>> >>> On Wed, 15 Jan 2025 at 12:12, Hyukjin Kwon wrote: >>> >>>> Yeah, I agree that cutting in the middle of week, e.g., commits are >>>> being merged, it might cause some confusions which branch

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
) and > help the on-going PR reviews. > > As you mentioned, we don't block feature backportings. There is no reason > to break the community schedule. > > Sincerely, > Dongjoon > > On Tue, Jan 14, 2025 at 17:52 Wenchen Fan wrote: > >> Hi all, >> >&

[REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
Hi all, According to the schedule , the 4.0 branch cut is set for January 15. I plan to cut the branch at the end of this week so that folks can have a few more working days to complete the ongoing work. Note: While we can still merge PRs to the 4.

[VOTE][RESULT] Use plain text logs by default

2025-01-13 Thread Wenchen Fan
Hi all, The vote for "Use plain text logs by default" passes with 19 +1s (8 binding +1s) (* = binding) +1: - beliefer - Yuming Wang * - Hyukjin Kwon * - Cheng Pan - Yang Jie - Jungtaek Lim - Mridul Muralidharan * - Gengliang Wang * - L. C. Hsieh - Allison Wang - Peter Toth - DB Tsai * - Denny Lee

[VOTE] Use plain text logs by default

2025-01-08 Thread Wenchen Fan
for the next 72 hours: [ ] +1: Accept the proposal [ ] +0 [ ]- 1: I don’t think this is a good idea because … Bests, Wenchen Fan [1] https://lists.apache.org/thread/4fnlnvhsqym72k53jw8cjhcdjbhpmm95

Re: [VOTE] Release Spark 3.5.4 (RC3)

2024-12-17 Thread Wenchen Fan
+1 On Wed, Dec 18, 2024 at 12:51 PM Cheng Pan wrote: > +1 > > Thanks, > Cheng Pan > > > > On Dec 17, 2024, at 17:23, 杨杰 wrote: > > Please vote on releasing the following candidate as Apache Spark version > 3.5.4. > > The vote is open until Dec 20, 10:00:00 UTC and passes if a majority +1 > PMC

Re: [DISCUSS] Release Apache Spark 3.5.4

2024-12-04 Thread Wenchen Fan
+1 On Wed, Dec 4, 2024 at 3:41 PM Mridul Muralidharan wrote: > > +1 for 3.5.4 > > Regards, > Mridul > > > On Wed, Dec 4, 2024 at 1:12 AM Dongjoon Hyun > wrote: > >> +1 for releasing 3.5.4. >> >> Thank you so much for the head-ups and volunteering as the release >> manager, Jie. >> >> I can help

  1   2   3   4   5   6   7   >