Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Wenchen Fan
+1 On Tue, Jun 10, 2025 at 11:40 AM Herman van Hovell wrote: > +1 > > On Tue, Jun 10, 2025 at 2:04 PM Rozov, Vlad > wrote: > >> +1 (non-binding) >> >> Thank you, >> >> Vlad >> >> On Jun 10, 2025, at 10:44 AM, Sakthi wrote: >> >> +1 (non-binding) >> >> On Mon, Jun 9, 2025 at 8:28 PM bo yang wr

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Wenchen Fan
+1 for email automation! On Thu, Jun 5, 2025 at 8:22 AM Yuanjian Li wrote: > +1 for option 1. > > Seems the only downside of option 1 is that some RC numbers may be > non-sequential. > > Dongjoon Hyun 于2025年6月5日周四 07:57写道: > >> +1 for the proposal, Hyukjin. Thank you for the whole and seamless

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
一 23:00写道: >>> >>>> +1 (non-binding) >>>> >>>> On Mon, Jun 2, 2025 at 7:34 AM Chao Sun wrote: >>>> >>>>> +1 >>>>> >>>>> On Mon, Jun 2, 2025 at 7:31 AM Jungtaek Lim < >>>>> kabhwan

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 8:55 PM Peter Toth wrote: > +1 > > On Mon, Jun 2, 2025 at 2:33 PM xianjin wrote: > >> +1. >> Sent from my iPhone >> >> On Jun 2, 2025, at 12:50 PM, DB Tsai wrote: >> >> +1 looking forward to seeing real-time mode. >> Sent from my iPhone >> >> On Jun 1, 2025, at 9:47 

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-01 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 9:55 AM Yuanjian Li wrote: > +1 > > On Sun, Jun 1, 2025 at 18:30 DB Tsai wrote: > >> +1 >> >> Sent from my iPhone >> >> > On Jun 1, 2025, at 2:32 AM, Yang Jie wrote: >> > >> > +1 >> > >> >> On 2025/06/01 08:09:38 Peter Toth wrote: >> >> +1 >> >> >> >>> On Sun, Jun 1,

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-01 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 9:30 AM DB Tsai wrote: > +1 > > Sent from my iPhone > > > On Jun 1, 2025, at 2:32 AM, Yang Jie wrote: > > > > +1 > > > >> On 2025/06/01 08:10:10 Peter Toth wrote: > >> +1 > >> > >>> On Sun, Jun 1, 2025 at 9:01 AM L. C. Hsieh wrote: > >>> > >>> +1 > >>> > >>> On Sat,

Re: [DISCUSS][MINOR] Fix broken link in spark-website for SS Programming Guide

2025-05-30 Thread Wenchen Fan
+1 to fix this issue immediately. On Fri, May 30, 2025 at 3:16 PM Jerry Peng wrote: > +1 for fixing this immediately. > > Anish, thanks for pointing this issue out! > > On Fri, May 30, 2025 at 12:12 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> I’m +1 to fix this in website for 4

[ANNOUNCE] Apache Spark 4.0.0 released

2025-05-28 Thread Wenchen Fan
k>, and Maven Central Repository. To view the release notes: https://spark.apache.org/releases/spark-release-4-0-0.html We would like to acknowledge all community members for contributing to this release. This release would not have been possible without you. Best, Wenchen

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-27 Thread Wenchen Fan
The current policy is that we only backport bug fixes and security fixes. If the CVEs and bug fixes from Parquet 1.13.1 to 1.15.2 affect Spark, we can backport the Parquet version upgrade. On Wed, May 28, 2025 at 10:47 AM Rozov, Vlad wrote: > I’ll go with the community vote. > > My take: > > - t

[REVIEW NEEDED] Spark 4.0.0 release notes

2025-05-26 Thread Wenchen Fan
Hi all, I'm trying my best to mention all the major changes in the Spark 4.0 release notes: https://github.com/apache/spark-website/pull/608 . This is a huge release, and we encourage you to review the changes and help us identify any missed items. Thanks, Wenchen

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-26 Thread Wenchen Fan
+1. When this release is out, let's also update the release process document to introduce the new way of making releases with GitHub Action jobs. On Tue, May 27, 2025 at 6:22 AM Dongjoon Hyun wrote: > +1 from my side. > > Thank you, Hyukjin. > > Dongjoon > > On 2025/05/26 22:19:22 Hyukjin Kwon w

Re: [VOTE][RESULT] Release Spark 4.0.0 (RC7)

2025-05-23 Thread Wenchen Fan
Hi all, I've created a draft for the Spark 4.0.0 release notes. Please take a look, and you can leave PR comments directly: https://github.com/apache/spark-website/pull/608 Thanks, Wenchen On Thu, May 22, 2025 at 9:38 PM Wenchen Fan wrote: > I'll finalize the release (publis

Re: [VOTE][RESULT] Release Spark 4.0.0 (RC7)

2025-05-22 Thread Wenchen Fan
I'll finalize the release (publishing, release notes, etc.) in the next few days. Big thanks to everyone who contributed to this huge release! On Thu, May 22, 2025 at 9:35 PM Wenchen Fan wrote: > The vote passes with 19 +1s (11 binding +1s) and no -1s. > Thanks to all who helped wi

[VOTE][RESULT] Release Spark 4.0.0 (RC7)

2025-05-22 Thread Wenchen Fan
The vote passes with 19 +1s (11 binding +1s) and no -1s. Thanks to all who helped with the vote! (* = binding) +1: Wenchen Fan (*) Hyukjin Kwon (*) Ruifeng Zheng (*) Jungtaek Lim Gengliang Wang (*) Dongjoon Hyun (*) Jules Damji Rozov, Vlad Denny Lee Szehon Ho Yuming Wang (*) Cheng Pan Yang Jie

Re: [VOTE] New Spark Connect Client Repository for Rust

2025-05-20 Thread Wenchen Fan
+1 On Tue, May 20, 2025 at 2:34 PM Jungtaek Lim wrote: > Sounds great, thanks for the clarification. > > +1 (non-binding) > > On Tue, May 20, 2025 at 3:20 PM Renjie Liu > wrote: > >> Hi, Jungtaek, Yuanjian: >> >> Yes, I'm more than happy to contribute to this project! >> >> On Tue, May 20, 2025

Re: [DISCUSS] Apache Spark 3.5.6 (automated release)

2025-05-19 Thread Wenchen Fan
+1, look forward to Spark's first automated release! On Tue, May 20, 2025 at 12:55 PM Yuming Wang wrote: > +1. > > On Tue, May 20, 2025 at 12:51 PM Hyukjin Kwon > wrote: > >> Hi, >> >> I am testing automatic release via Github Actions ( >> https://github.com/apache/spark/pull/50911), and it's a

Re: [VOTE] Release Spark 4.0.0 (RC7)

2025-05-19 Thread Wenchen Fan
Same as before, I'll start with my own +1. On Mon, May 19, 2025 at 8:25 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until May 22 (PST) and passes if a majority +1 PMC votes > are cast, wi

[VOTE] Release Spark 4.0.0 (RC7)

2025-05-19 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 22 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: [VOTE] Release Apache Spark Connect Swift Client 0.2.0 (RC1)

2025-05-18 Thread Wenchen Fan
+1 On Mon, May 19, 2025 at 4:48 AM Jules Damji wrote: > + 1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On May 17, 2025, at 4:32 PM, Zhou Jiang wrote: > > +1 (non-binding) > > On May 17, 2025, at 1

Re: [VOTE] Release Apache Spark K8s Operator 0.2.0 (RC1)

2025-05-18 Thread Wenchen Fan
+1 On Mon, May 19, 2025 at 7:07 AM Liu Cao wrote: > +1 (non-binding) > > On Mon, May 19, 2025 at 03:07 Rozov, Vlad > wrote: > >> +1 (non-binding) >> >> Vlad >> >> On May 18, 2025, at 7:02 AM, Peter Toth wrote: >> >> +1 >> >> On Sun, May 18, 2025 at 1:29 AM Hyukjin Kwon >> wrote: >> >>> +1 >>>

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-17 Thread Wenchen Fan
Thanks for > reporting the issue! > > On Sat, May 17, 2025 at 3:05 AM Anish Shrigondekar > wrote: > >> Hi Adam, >> >> Thanks for reporting the issue. @Eric Marnadi >> has a PR for the fix here - >> https://github.com/apache/spark/pull/50924 >> >> @We

Re: [DISCUSS][SPARK SQL] SPARK-51710: Using Dataframe.dropDuplicates with an empty array as argument behaves "unexpectedly"

2025-05-14 Thread Wenchen Fan
So you are basically saying df.dropDuplicates(Seq.empty) should be the same as df.dropDuplicates(all_columns). I think this is a reasonable change, as the previous behavior doesn't make sense which always returns the first row. For safety, we can add a legacy config for fallback and mention it in t

Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-13 Thread Wenchen Fan
Same as before, I'll start with my own +1. On Wed, May 14, 2025 at 12:28 AM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until May 16 (PST) and passes if a majority +1 PMC votes > are cast, wi

[VOTE] Release Spark 4.0.0 (RC6)

2025-05-13 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 16 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: Re: [VOTE] Release Spark 4.0.0 (RC5)

2025-05-13 Thread Wenchen Fan
hecksum, ran basic test on >>>> spark-4.0.0-bin-hadoop3. >>>> >>>> Thanks >>>> Szehon >>>> >>>> On Mon, May 12, 2025 at 9:02 PM Sakthi wrote: >>>> >>>>> +1 (non-binding) >>>>> >&g

Re: [VOTE] Release Spark 4.0.0 (RC5)

2025-05-12 Thread Wenchen Fan
I'll start with my own +1. All the known blockers are fixed, and I verified that the new Spark Connect distribution works as expected. On Fri, May 9, 2025 at 8:16 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > >

[VOTE] Release Spark 4.0.0 (RC5)

2025-05-09 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until May 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To le

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Wenchen Fan
Li wrote: >>> >>>> +1 >>>> >>>> On Mon, May 5, 2025 at 18:35 Yuming Wang wrote: >>>> >>>>> +1 >>>>> >>>>> On Tue, May 6, 2025 at 9:12 AM Denny Lee >>>>> wrote: >>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Wenchen Fan
+1 On Tue, May 6, 2025 at 3:40 AM Reynold Xin wrote: > +1 > > > On Mon, May 5, 2025 at 12:37 PM Bjørn Jørgensen > wrote: > >> +1 >> >> man. 5. mai 2025 kl. 21:28 skrev Milan Stefanovic < >> stefanovic.mila...@gmail.com>: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Milan >>> >>> On Mon, 5 May 20

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Wenchen Fan
+1, thanks! On Tue, May 6, 2025 at 2:21 AM DB Tsai wrote: > +1 > > On May 5, 2025, at 1:10 AM, Gabor Somogyi > wrote: > >  > +1 (non-binding) > > G > > > On Mon, May 5, 2025 at 8:59 AM huaxin gao wrote: > >> +1 Thanks Dongjoon. >> >> On Sun, May 4, 2025 at 7:36 PM Rozov, Vlad >> wrote: >> >>

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Wenchen Fan
Hi Jacek, Thanks for the confirmation! Let's change the wording first, and open a JIRA ticket for the relative path support. Wenchen On Tue, Apr 29, 2025 at 2:41 AM Jacek Laskowski wrote: > Hi Wenchen, > > Looks like it didn't work in 3.5 either. > > ❯ ./bin/spark-s

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Wenchen Fan
Hi Jacek, Thanks for reporting the issue! Did you hit the same problem when you set the `spark.jars.ivy` config with Spark 3.5? If this config never worked with a relative path, we should change the wording in the migration guide. Thanks, Wenchen On Sun, Apr 27, 2025 at 10:27 PM Jacek Laskowski

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Wenchen Fan
t; >> Vlad >> >> On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote: >> >> The deadlock is introduced by PARQUET-2432(1.14.0), if we decide >> downgrade, the latest workable version is Parquet 1.13.1. >> >> Thanks, >> Cheng Pan >> >> >> >&g

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Wenchen Fan
gt;> >>> Once merged, this should be included in the next RC. >>> >>> On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote: >>> >>>> Please vote on releasing the following candidate as Apache Spark >>>> version 4.0.0. >>>> >&

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Wenchen Fan
. Thanks, Wenchen On Tue, Apr 15, 2025 at 7:24 AM Yuming Wang wrote: > This release uses Parquet 1.15.1. It seems Parquet 1.15.1 may cause > deadlock. > > > Found one Java-level deadlock: > > = > > "Executor 566 task launch worker for task

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Wenchen Fan
@Hyukjin Kwon is looking into it now and will share the findings later. Please let me know if you find any other issues with RC4, either functionality issues with Spark itself, or integration issues with downstream libraries. Thanks! Wenchen On Thu, Apr 10, 2025 at 11:21 PM Wenchen Fan wrote:

[VOTE] Release Spark 4.0.0 (RC4)

2025-04-10 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until April 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To

Re: [VOTE] SPIP: Declarative Pipelines

2025-04-09 Thread Wenchen Fan
+1 (binding) On Thu, Apr 10, 2025 at 7:30 AM Szehon Ho wrote: > +1 (non-binding) > > Thanks > Szehon > > On Wed, Apr 9, 2025 at 3:42 PM Hyukjin Kwon wrote: > >> I will shephard. >> >> On Thu, 10 Apr 2025 at 07:28, Anton Okolnychyi >> wrote: >> >>> +1 (non-binding) >>> >>> - Anton >>> >>> ср, 9

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
+1 As Gengliang explained, the API allows the connectors to request Spark to perform data validations, but connectors can also choose to do validation by themselves. I think it's a reasonable design as not all connectors have the ability to do data validation by themselves, such as file formats th

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
framework, such as spark-expectation <https://engineering.nike.com/spark-expectations/v2.1.1/>, but it's an orthogonal topic. Table Constraint itself is a standard SQL feature and many databases support it. I think it's reasonable to make Spark support it as well. Thanks, Wenchen On W

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-31 Thread Wenchen Fan
/spark/pull/50197 is merged, which updates the to-be-released stored procedure DS v2 API. Thanks, Wenchen On Mon, Mar 24, 2025 at 7:36 PM Hyukjin Kwon wrote: > -1. > > Scala and PySpark shell are broken by > https://github.com/apache/spark/pull/49971. Reverted it for now. > >

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-31 Thread Wenchen Fan
geospatial processing in Spark, ultimately benefiting the entire ecosystem. Let’s collaborate closely to define a standard protocol for geospatial processing in Spark—one that serves all stakeholders. Looking forward to working together! Thanks, Wenchen On Mon, Mar 31, 2025 at 3:07 AM Szehon Ho wrote

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Wenchen Fan
inary/text format to use? how to represent geo values in Java? etc.) Thanks, Wenchen On Sat, Mar 29, 2025 at 5:28 AM Jia Yu wrote: > Dear Menelaos, > > Thanks for bringing this up again. I’ve seen similar proposals come up on > the mailing list before, and I’d like to offer some tho

Re: [VOTE] SPIP: Support NanoSecond Timestamps

2025-03-27 Thread Wenchen Fan
Maybe we should discuss the key issues on the dev list as it's easy to lose track of Google Doc comments. I think all the proposals for adding new data types need to prove that the new data type is common/standard in the ecosystem. This means 3 things: - it has common/standard semantic. TIMESTAMP

Re: Requesting advice, thought

2025-03-27 Thread Wenchen Fan
The file source in Spark has not been migrated to DS v2 yet and uses dedicated catalyst rules to do runtime filtering, e.g. PartitionPruning and PlanDynamicPruningFilters On Thu, Mar 27, 2025 at 6:53 PM Asif Shahid wrote: > Hi Experts, > Could you please allow me to pick your brain on the follo

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Wenchen Fan
A slightly off-topic but related question: It feels fragile to test with SBT while publishing the release with Maven. How did we end up in this situation? Moreover, since most Spark developers use SBT for their daily work, it becomes even harder to catch issues with the Maven build. On Thu, Mar 27

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-25 Thread Wenchen Fan
I agree, 4.0 is already in the RC stage and I think it's too late to do such a big version bump for the Hive dependency. We definitely need to do this upgrade and thanks for working on it! On Mon, Mar 24, 2025 at 1:31 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > That's grea

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Wenchen Fan
rule to make sure this won't happen again, but for existing problematic tests, I don't think we need to block any release if the current solution resolves the ASF policy issue. We should create JIRA tickets and welcome anyone to fix them, but they don't have to be blockers. Thanks,

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Wenchen Fan
-disabling commit so they can help report any test failures. That said, since these tests are quite old and stable, failures are unlikely. Thanks, Wenchen On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad wrote: > There is a difference between technical debt and legal issue. ASF may > request t

[VOTE] Release Spark 4.0.0 (RC3)

2025-03-20 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until March 25 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Wenchen Fan
>> +1: >> - Mark Hamstra * >> - Jungtaek Lim >> - Wenchen Fan * >> - Reynold Xin * >> - Yuanjian Li * >> >> -1: >> - Holden Karau * >> - Hyukjin Kwon * >> - Dongjoon Hyun * >> >> Thanks. >> >>

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-17 Thread Wenchen Fan
it. Of course, none of us want to keep the misnamed configuration in 4.0.0, and it’s clear to me that applying the “configuration deprecation” approach from 3.5.5 to 4.0.0 is the best path forward. I don’t believe Dongjoon’s veto has valid technical justification, so I’m +1 on this vote. Thanks, W

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
36 AM Wenchen Fan wrote: > I've created the revert PR for branch-4.0: > https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy > consensus but it's clear that this breaking change PR has failed to achieve > consensus. > > I hope we now have a clear founda

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
I've created the revert PR for branch-4.0: https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy consensus but it's clear that this breaking change PR has failed to achieve consensus. I hope we now have a clear foundation for discussing solutions. As it stands, the misnamed confi

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
y. On Mon, Mar 17, 2025 at 8:49 AM Wenchen Fan wrote: > Before I cast my vote here, I'd like to highlight one thing: As the > release manager of Apache Spark 4.0.0, I was not notified about the > breaking change of renaming an already-released configuration: > https://github.com/apac

Re: Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-15 Thread Wenchen Fan
In general, I think it's good for Spark to support the common data types in the ecosystem, as it's the only way to fully integrate with the ecosystem. So +1. On Fri, Mar 14, 2025 at 8:56 AM 谭琦 wrote: > Updated. Thanks. > > On 2025/03/13 23:56:20 Jungtaek Lim wrote: > > Hi, would you mind allowin

Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-13 Thread Wenchen Fan
f this. > > > > (* = binding) > > +1: > > - Sean R. Owen * > > - Jungtaek Lim > > - Nicholas Chammas > > - Wenchen Fan * > > - Adam Binford > > - Russell Jurney > > - Yang Jie * > > > > -1: > > - Dongjoon Hyun * > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >

Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-13 Thread Wenchen Fan
Will this nanosecond timestamp be a fixed-size (10 bytes) binary in UnsafeRow and ColumnVector? On Thu, Mar 13, 2025 at 4:57 PM Jungtaek Lim wrote: > Hi, would you mind allowing comments on the doc? Thanks! > > On Fri, Mar 14, 2025 at 8:50 AM Qi Tan wrote: > >> Hello everybody, >> >> I would li

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-11 Thread Wenchen Fan
+1 On Mon, Mar 10, 2025 at 8:02 AM L. C. Hsieh wrote: > +1 > > Thanks Dongjoon for contributing to Swift implementation. > > On Mon, Mar 10, 2025 at 7:18 AM Hyukjin Kwon wrote: > > > > +1 > > On Mon, Mar 10, 2025 at 6:48 AM Yang Jie wrote: > >> > >> Great! Really happy to see that spark-connec

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Wenchen Fan
follow a policy, let’s correct it, but can someone point to the specific policy we’re violating? If the vote is about adding migration logic to ease the upgrade from 3.5.4 to 4.0.0, then +1, why not? Thanks, Wenchen On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim wrote: > Well said, Sean. So

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-06 Thread Wenchen Fan
gin >> >> >> 2. Similary installing the tar balls of either distribution and launch >> spark-shell goes into a loop and terminated by the shutdown hook. >> >> >> Thank you Wenchen for leading these release onerous manager efforts, but >> sl

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Wenchen Fan
Shall we open an official vote for it? We can put more details on it so that people can vote: 1. how does it break user workloads without this migration code? 2. what is the Apache policy for leaked vendor names in the codebase? I think this is not the only one, we also mentioned `com.databricks.sp

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-03 Thread Wenchen Fan
-1 > > https://github.com/apache/spark/pull/50112 should be merged before > release. > > > > At 2025-03-01 15:25:06, "Wenchen Fan" wrote: > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > The vote is open until Ma

[VOTE] Release Spark 4.0.0 (RC2)

2025-02-28 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until March 5 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To l

Re: [VOTE] Release Spark 4.0.0 (RC1)

2025-02-23 Thread Wenchen Fan
ckers. >>>>> >>>>> I know RC1 is mostly for everyone to audit the release artifacts >>>>> earlier before everything is ready, but just wanted to make clear that we >>>>> expect to have another RC. >>>>> >>>>> On

Re: [VOTE] SPIP: Add the TIME data type

2025-02-23 Thread Wenchen Fan
+1 On Mon, Feb 24, 2025 at 7:51 AM John Zhuge wrote: > +1 (non-binding) > > John Zhuge > > > On Sun, Feb 23, 2025 at 2:37 PM huaxin gao wrote: > >> +1 >> >> On Sun, Feb 23, 2025 at 1:51 PM serge rielau.com >> wrote: >> >>> +1 it’s abt time. >>> Sent from my iPhone >>> >>> > On Feb 23, 2025, a

Re: [VOTE] Release Spark 3.5.5 (RC1)

2025-02-23 Thread Wenchen Fan
-0 as I just found a long-standing correctness bug: https://github.com/apache/spark/pull/50040 It's not a regression in 3.5 so technically it's not a release blocker, but it's better to include it as we are just about to release 3.5.5. On Mon, Feb 24, 2025 at 9:11 AM Mich Talebzadeh wrote: > +1

[VOTE] Release Spark 4.0.0 (RC1)

2025-02-19 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until February 21 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Wenchen Fan
+1 On Wed, Feb 19, 2025 at 2:36 PM Sakthi wrote: > +1 (non-binding) > > On Tue, Feb 18, 2025 at 10:21 PM Yang Jie wrote: > >> +1 >> >> On 2025/02/19 05:57:53 Mark Hamstra wrote: >> > +1 >> > >> > On Tue, Feb 18, 2025 at 9:46 PM dongjoon.hyun >> wrote: >> > > >> > > Please vote to deprecate `sp

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
is a one-time exception, we are not changing the rule here. I'm simply suggesting that we have more discussions about breaking changes on the dev list in the future, as exceptions may arise. I hope I’ve made myself clear this time. Thanks, Wenchen On Wed, Feb 19, 2025 at 1:59 PM Mark Ha

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
ilding consensus. Thanks to Dongjoon for starting this discussion, and I encourage all of us to do it for other breaking changes as well. Thanks, Wenchen On Wed, Feb 19, 2025 at 12:56 PM Dongjoon Hyun wrote: > I have different perspectives from Wenchen's opinion in three ways. > > > I

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
create a sense of fairness among vendors. >>>> >>>> Twitter: https://twitter.com/holdenkarau >>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>> Books (Learnin

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Wenchen Fan
It’s unfortunate that we missed identifying these issues during the code review. However, since they have already been released, I believe deprecating them is a better approach than removing them, as the latter would introduce a breaking change. Regarding Jungtaek’s PR

4.0.0 RC1 is coming

2025-02-16 Thread Wenchen Fan
b.com/apache/spark/pull/49573>) or revert the codegen support of from_json. Please reply to this email if you have other ongoing work to add to this list. Thanks, Wenchen

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-13 Thread Wenchen Fan
ermine with static data , if output will >>> be deterministic ?. Expressions already have deterministic flag. So when an >>> attribute is created from alias, it will be possible to know if attribute >>> is pointing to an inDeterminate component. >>> >>> O

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Wenchen Fan
Interesting, so this is PySpark on pandas which is the reverse of Koalas. If performance is the only problem, maybe we can improve local-mode Spark performance to be on par with these single-node engines. + @Hyukjin Kwon On Mon, Feb 10, 2025 at 8:40 PM José Müller wrote: > Hi Mitch, > > All yo

[VOTE][RESULT] Publish additional Spark distribution with Spark Connect enabled

2025-02-07 Thread Wenchen Fan
Hi all, The vote for "Publish additional Spark distribution with Spark Connect enabled" passes with 22 +1s (13 binding +1s) (* = binding) +1: - Mridul Muralidharan * - Hyukjin Kwon * - Jungtaek Lim - Xiao Li * - DB Tsai * - Sakthi - Gengliang Wang * - L. C. Hsieh * - Yang Jie * - Max Gekk * - Yum

Re: Docs look weird; can't build locally

2025-02-06 Thread Wenchen Fan
Hi Nicholas, I don't know how to set JVM memory for this bundle command, but this PR might help reduce memory usage and avoid the OOM issue. On Fri, Feb 7, 2025 at 12:07 AM Nicholas Chammas wrote: > Gengliang has the CSS/display issue fixed >

Re: ASF board report draft for February 2025

2025-02-05 Thread Wenchen Fan
Hi Angel, AFAIK many people rely on the Spark UI to debug/inspect their queries with the query pan tree and metrics, but you are right that plan string generation is expensive, and we shouldn't do it for every AQE plan change. Maybe we should do it only once to report the final plan for AQE? Let's

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
5, 2025 at 8:42 AM Martin Grund >>> wrote: >>> >>>> +1 >>>> >>>> On Wed, Feb 5, 2025 at 17:15 bo yang wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> On Wed, Feb 5, 2025 at 7:51 AM Jule

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
ld easily launch a remote session on another user's > connect server if it's just automatically launched with no authentication > enabled. > > On Wed, Feb 5, 2025 at 8:54 AM Wenchen Fan wrote: > >> Hi Adam, >> >> May bad, I should have provided more conte

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
n Connect >> >>>>> SPARK-50899 Support PrefixSpan on connect >> >>>>> SPARK-51060 Support `QuantileDiscretizer` on Connect >> >>>>> SPARK-50974 Add support foldCol for CrossValidator on connect >> >>>>> SPARK-5101

[VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Wenchen Fan
xt 72 hours: [ ] +1: Accept the proposal [ ] +0 [ ]- 1: I don’t think this is a good idea because … Best, Wenchen Fan

Re: [DISCUSS] Spark - How to improve our release processes

2025-02-04 Thread Wenchen Fan
w. On Tue, Feb 4, 2025 at 6:32 PM Nimrod Ofek wrote: > Hi all, > > I am trying to revive this thread - to work towards a better release > process, and making sure we have no conflicts in the used artifacts like > nicholas.cham...@gmail.com mentioned. > @Wenchen Fan - can you pleas

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-04 Thread Wenchen Fan
. Thanks, Wenchen On Tue, Feb 4, 2025 at 4:10 PM Dongjoon Hyun wrote: > Hi, Wenchen. > > I'm wondering if this implies any delay of the existing QA and RC1 > schedule or not. > > If then, why don't we schedule this new alternative proposal on Spark 4.1 > properl

[DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-03 Thread Wenchen Fan
configure it manually. Looking forward to hearing your thoughts! Thanks, Wenchen

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Wenchen Fan
It looks like a hard problem to statically analyze the query plan and decide whether a Spark stage is deterministic or not. When I added RDD DeterministicLevel, I thought it was not a hard problem for the callers to specify it, but seems I was wrong. Maybe we should do it at runtime: if Spark retr

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-26 Thread Wenchen Fan
o 4.1 or have a simpler working version in 4.0. For other new projects, please do not merge them to branch-4.0. Thanks, Wenchen On Wed, Jan 22, 2025 at 7:03 PM Milan Cupac wrote: > I am working on recursive CTEs. > > Two final PRs should be merged soon: > https://github.com/apache/spar

Re: A documentation change is a user-facing change

2025-01-16 Thread Wenchen Fan
+1 to update the PR template. I think the intent is to ask PR authors to call out all the user-facing changes that need attention from the end users, such as new features and behavior changes, but doc change is clearly not one of them. On Fri, Jan 17, 2025 at 7:10 AM Gengliang Wang wrote: > Than

[DISCUSS] Ongoing projects for Spark 4.0

2025-01-15 Thread Wenchen Fan
date is Feb 15. Thanks, Wenchen

Re: [REMINDER] 4.0 branch cut is coming

2025-01-15 Thread Wenchen Fan
"branch-4.0" has been cut. Please merge commits to this branch as well if you think it's necessary before Feb 1. On Wed, Jan 15, 2025 at 4:37 PM Wenchen Fan wrote: > Yes, I'll start a new thread this week to collect the ongoing projects and > track the progress. We ca

Re: [REMINDER] 4.0 branch cut is coming

2025-01-15 Thread Wenchen Fan
national holiday of both China and South Korea. Shall we collect and > check the plan of ongoing projects to see whether they can make it in time > in advance, so that we could determine the realistic date of the QA period > rather than an arbitrary one? > > On Wed, Jan 15, 2025 at 5

Re: [REMINDER] 4.0 branch cut is coming

2025-01-15 Thread Wenchen Fan
n merging PRs if you want to backport it to 4.0. Thanks, Wenchen On Wed, Jan 15, 2025 at 3:44 PM Dongjoon Hyun wrote: > Let me enumerate some questions. > > 1. Since you have no open list, do you have some internal PRs which are > planned to be open the rest of Today? > >

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
ell what is going on there. > Which PRs are those? > > > I'm still reviewing and may merge PRs during the daytime. > > Let's see what your real concern is and let us help you. > > I really want to understand how those PRs are blocked or interrupted by > `branch cut`. &g

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
formed. > ``` > > BTW, after creating the `branch-4.0`, the following PR will be merged > in order to differentiate `master` from `branch-4.0`. > > [SPARK-50822][BUILD] Setting version to 4.1.0-SNAPSHOT > https://github.com/apache/spark/pull/49495 > > Best regards, >

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
n Wed, Jan 15, 2025 at 11:56 AM Hyukjin Kwon wrote: > I don't have any examples :-). Just wanted to make sure. +1 for cutting > soon as scheduled. > > On Wed, 15 Jan 2025 at 12:55, Dongjoon Hyun > wrote: > >> Thank you, Wenchen and Hyukjin. >> >> BTW, if

Re: [REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
either way. On Wed, Jan 15, 2025 at 10:40 AM Dongjoon Hyun wrote: > Hi, Wenchen. > > I'd not delay the branch cut. We had better stick to our plan and those > week days to stablize branch-4.0. > > If you are busy, I'm going to cut the branch on schedule (2015-01-15

[REMINDER] 4.0 branch cut is coming

2025-01-14 Thread Wenchen Fan
to the 4.0 branch after the cut, code freeze will begin on February 1. After that date, only bug fixes can be backported. Thanks, Wenchen

[VOTE][RESULT] Use plain text logs by default

2025-01-13 Thread Wenchen Fan
Hi all, The vote for "Use plain text logs by default" passes with 19 +1s (8 binding +1s) (* = binding) +1: - beliefer - Yuming Wang * - Hyukjin Kwon * - Cheng Pan - Yang Jie - Jungtaek Lim - Mridul Muralidharan * - Gengliang Wang * - L. C. Hsieh - Allison Wang - Peter Toth - DB Tsai * - Denny Lee

[VOTE] Use plain text logs by default

2025-01-08 Thread Wenchen Fan
for the next 72 hours: [ ] +1: Accept the proposal [ ] +0 [ ]- 1: I don’t think this is a good idea because … Bests, Wenchen Fan [1] https://lists.apache.org/thread/4fnlnvhsqym72k53jw8cjhcdjbhpmm95

  1   2   3   4   5   6   7   >