Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Herman van Hovell
+1 On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar wrote: > +1 - Would be great to get this into the Spark 4.0 release. > > Thanks, > Anish > > On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim > wrote: > >> Hi dev, >> >> We are going to introduce a new API named `transformWithState` for >> streami

Re: [VOTE] Release Spark 3.5.5 (RC1)

2025-02-25 Thread Herman van Hovell
https://github.com/apache/spark/blob/master/sql/connect/client/jvm/src/test/resources/StubClassDummyUdf.scala (commit 408c3df) is the source code for the following jars: - ./connector/connect/server/src/test/resources/udf_noA.jar - ./connector/connect/client/jvm/src/test/resources/udf2.13.ja

Re: [Connect] Spark connect documentation clarification request

2025-02-03 Thread Herman van Hovell
Hi Nimrod, We are working on this as we speak. There is already a PR out for the extensions use case: https://github.com/apache/spark/pull/49604 Kind regards, Herman On Mon, Feb 3, 2025 at 10:10 AM Nimrod Ofek wrote: > Hi, > > In https://spark.apache.org/spark-connect/ - at the bottom it says

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Herman van Hovell
There are many factors: - Typically it is a race between multiple PRs, where they all pass CI without the other changes, and get merged at the same time. - Differences between (the nightly job and the PR job) environments (e.g. size of the machine) can also cause these issues. - In

Re: PR review

2024-12-30 Thread Herman van Hovell
What do you need to have reviewed? On Mon, Dec 30, 2024 at 3:48 PM Rozov, Vlad wrote: > Hi, > > How can I request PR review? Sorry if this was already discussed on the > list or is available in the archive or spark.apache.org. > > Thank you, > > Vlad >

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-11-28 Thread Herman van Hovell
tures of Spark Connect >>>>>> and >>>>>> support workloads that we previously thought could not be supported >>>>>> easily. >>>>>> >>>>>> https://github.com/apache/spark/pull/48791 >>>>>> &

Spark Connect the default API in Spark 4.0

2024-11-25 Thread Herman van Hovell
Hi All, I would like to start a discussion on "Spark Connect the default API in Spark 4.0". The rationale for this change is that Spark Connect brings a lot of improvements with respect to simplicity, stability, isolation, upgradability, and extensibility (all detailed in the SPIP). In a nutshell

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Herman van Hovell
+1 On Mon, Sep 30, 2024 at 12:21 PM Dongjoon Hyun wrote: > +1 > > Thank you, Holden. > > Dongjoon. > > On 2024/09/30 18:01:17 Holden Karau wrote: > > I think it has been de-facto deprecated, we haven’t updated it > meaningfully > > in several years. I think removing the API would be excessive bu

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Herman van Hovell
+1 On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin wrote: > +1 > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev > wrote: > >> Hi all, >> >> I’d like to start a vote for a single-pass Analyzer for the Catalyst >> project. This project will introduce a new analysis framework to the >> Catalyst, w

Re: [DISCUSS] Creating `branch-4.0` and Feature Freeze for Apache Spark 4.0

2024-09-26 Thread Herman van Hovell
Hi, Can we push back the dates by at least 2 months? We are working on unifying the Connect and Classic Scala interface, and I would like to avoid rushing things. Kind regards, Herman On Thu, Sep 26, 2024 at 3:19 PM Dongjoon Hyun wrote: > Hi, All. > > We've delivered two preview releases for

Re: [VOTE] Release Apache Spark 3.5.3 (RC3)

2024-09-11 Thread Herman van Hovell
+1 On Wed, Sep 11, 2024 at 3:30 AM Kent Yao wrote: > +1, thank you, Haejoon > Kent > > On 2024/09/11 06:12:19 Gengliang Wang wrote: > > +1 > > > > On Mon, Sep 9, 2024 at 6:01 PM Wenchen Fan wrote: > > > > > +1 > > > > > > On Tue, Sep 10, 2024 at 7:42 AM Rui Wang .invalid> > > > wrote: > > > >

Re: [VOTE] Deprecate SparkR

2024-08-21 Thread Herman van Hovell
+1 On Wed, Aug 21, 2024 at 2:55 PM Martin Grund wrote: > +1 > > On Wed, Aug 21, 2024 at 20:26 Xiangrui Meng wrote: > >> +1 >> >> On Wed, Aug 21, 2024, 10:24 AM Mridul Muralidharan >> wrote: >> >>> +1 >>> >>> >>> Regards, >>> Mridul >>> >>> >>> On Wed, Aug 21, 2024 at 11:46 AM Reynold Xin >>>

Re: [外部邮件] Re: [DISCUSS] Deprecating SparkR

2024-08-21 Thread Herman van Hovell
+1 Let's start a vote? On Fri, Aug 16, 2024 at 2:05 AM yangjie01 wrote: > +1 > 原始邮件 > 发件人:Jungtaek Lim > 时间:2024-08-16 09:06:52 > 主题:[外部邮件] Re: [DISCUSS] Deprecating SparkR > 收件人:Wenchen Fan; > 抄送人:L. C. Hsieh;Dongjoon > Hyun;Holden > Karau;Xiao Li;Hyukjin Kwon< > gurwls...@a

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Herman van Hovell
+1(000) on this! This should massively reduce allocations done in the analyzer, and it is much more efficient. I also can't count the times that I had to increase the number of iterations. This sounds like a no-brainer to me. I do have two questions: - How do we ensure that we don't accidenta

Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Herman van Hovell
+1 On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan wrote: > +1 > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng wrote: > >> +1 >> >> Thank you @Hyukjin Kwon ! >> >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang wrote: >> >>> +1 >>> >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon >>> wrote: >>>

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-05 Thread Herman van Hovell
+1 On Fri, Jul 5, 2024 at 1:52 AM Hyukjin Kwon wrote: > (I will leave this vote open till 10th July, considering that its holiday > season in US) > > On Thu, 4 Jul 2024 at 23:39, Peter Toth wrote: > >> +1 >> >> John Zhuge ezt írta (időpont: 2024. júl. 4., Cs, >> 5:38): >> >>> +1 >>> >>> >>> Jo

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Herman van Hovell
There is no really easy way of getting the state of the aggregation buffer, unless you are willing to modify the code generation and sprinkle in some logging. What I would start with is dumping the generated code by calling explain('codegen') on the DataFrame. That helped me to find similar issues

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Herman van Hovell
+1 On Tue, Sep 26, 2023 at 10:39 AM yangjie01 wrote: > +1 > > > > *发件人**: *Yikun Jiang > *日期**: *2023年9月26日 星期二 18:06 > *收件人**: *dev > *抄送**: *Hyukjin Kwon , Ruifeng Zheng < > ruife...@apache.org> > *主题**: *Re: [VOTE] Updating documentation hosted for EOL and maintenance > releases > > > > +1,

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Herman van Hovell
Tested connect, and everything looks good. +1 On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li wrote: > Please vote on releasing the following candidate(RC4) as Apache Spark > version 3.5.0. > > The vote is open until 11:59pm Pacific time Sep 8th and passes if a > majority +1 PMC votes are cast, with

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-16 Thread Herman van Hovell
Hi Yuanjian, For the ongoing encoder work for the connect scala client I'd like to get the following tickets in: - SPARK-44396 : Direct Arrow Deserialization - SPARK-9 : Upcasti

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Herman van Hovell
Dongjoon, I am not sure if I am not sure if I follow the line of thought here. Multiple people have asked for clarification on what Spark 4.0 would mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me to this list. However you choose to single out Xiao because asks this questi

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-03-30 Thread Herman van Hovell
+1 On Thu, Mar 30, 2023 at 11:05 PM Sean Owen wrote: > +1 same result from me as last time. > > On Thu, Mar 30, 2023 at 3:21 AM Xinrong Meng > wrote: > >> Please vote on releasing the following candidate(RC5) as Apache Spark >> version 3.4.0. >> >> The vote is open until 11:59pm Pacific time *A

Re: Ammonite as REPL for Spark Connect

2023-03-23 Thread Herman van Hovell
ything else I am missing ? > > Regards, > Mridul > > > > On Wed, Mar 22, 2023 at 6:58 PM Herman van Hovell > wrote: > >> Ammonite is maintained externally by Li Haoyi et al. We are including it >> as a 'provided' dependency. The integration bits and pieces

Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Herman van Hovell
uded into Apache Spark ? > > Regards , > Mridul > > > > On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell > wrote: > >> Hi All, >> >> For Spark Connect Scala Client we are working on making the REPL >> experience a bit nicer <https://github.com/a

Ammonite as REPL for Spark Connect

2023-03-22 Thread Herman van Hovell
Hi All, For Spark Connect Scala Client we are working on making the REPL experience a bit nicer . In a nutshell we want to give users a turn key scala REPL, that works even if you don't have a Spark distribution on your machine (through coursier

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Herman van Hovell
Hi Jia, How are you building connect? Kind regards, Herman On Wed, Mar 8, 2023 at 8:48 AM Jia Fan wrote: > Thanks for reply, > I had done clean build with maven few times. But always report > > /Users/xxx/Code/spark/core/target/generated-sources/org/apache/spark/status/protobuf/StoreTypes.

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Herman van Hovell
Hi All, Thanks for testing the 3.4.0 RC! I apologize for the maven testing failures for the Spark Connect Scala Client. We will try to get those sorted as soon as possible. This is an artifact of having multiple build systems, and only running CI for one (SBT). That, however, is a debate for anot

Re: Depolying stage-level scheduling for Spark SQL

2022-09-29 Thread Herman van Hovell
I think issue 2 is caused by adaptive query execution. This will break apart queries into multiple jobs, each subsequent job will generate a RDD that is based on previous ones. As for 1. I am not sure how much you want to expose to an end user here. SQL is declarative, and it does not specify how

Re: Why are hash functions seeded with 42?

2022-09-26 Thread Herman van Hovell
Sorry about that, it made me laugh 6 years ago, I didn't expect this to come back and haunt me :)... There are ways out of this, none of them are particularly appealing: - Add a SQL conf to make the value configurable. - Add a seed parameter to the function. I am not sure if we can make this work

[VOTE][RESULT] SPIP: Spark Connect

2022-06-16 Thread Herman van Hovell
The vote passes with 17 +1s (10 binding +1s). +1: Herman van Hovell* Matei Zaharia* Yuming Wang Hyukjin Kwon* Chao Sun L.C. Hsieh* Huaxin Gao Ruifeng Zheng Wenchen Fan* Believer Xiao Li* Reynold Xin* Dongjoon Hyun* Gangliang Wang Yikun Jiang Tom Graves * Holden Karau * 0: None (Tom has voiced

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread Herman van Hovell
Let me kick off the voting... +1 On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell wrote: > Hi all, > > I’d like to start a vote for SPIP: "Spark Connect" > > The goal of the SPIP is to introduce a Dataframe based client/server API > for Spark > > Pl

[VOTE][SPIP] Spark Connect

2022-06-13 Thread Herman van Hovell
Hi all, I’d like to start a vote for SPIP: "Spark Connect" The goal of the SPIP is to introduce a Dataframe based client/server API for Spark Please also refer to: - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark Connect - A client and server interface for Apache Spark.

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Herman van Hovell
+1 On Mon, Jun 13, 2022 at 12:53 PM Wenchen Fan wrote: > +1, tests are all green and there are no more blocker issues AFAIK. > > On Fri, Jun 10, 2022 at 12:27 PM Maxim Gekk > wrote: > >> Please vote on releasing the following candidate as >> Apache Spark version 3.3.0. >> >> The vote is open un

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-13 Thread Herman van Hovell
+1 On Tue, Apr 13, 2021 at 2:40 AM sarutak wrote: > +1 (non-binding) > > > +1 > > > > On Tue, 13 Apr 2021, 02:58 Sean Owen, wrote: > > > >> +1 same result as last RC for me. > >> > >> On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh > >> wrote: > >> > >>> Please vote on releasing the following c

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-22 Thread Herman van Hovell
+1 On Mon, Feb 22, 2021 at 12:59 PM Jungtaek Lim wrote: > +1 (non-binding) > > Verified signatures. Only a few commits added after RC2 which don't seem > to change the SS behavior, so I'd carry over my +1 from RC2. > > On Mon, Feb 22, 2021 at 3:57 PM Hyukjin Kwon wrote: > >> Starting with my +1

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-16 Thread Herman van Hovell
+1 On Tue, Feb 16, 2021 at 11:08 AM Hyukjin Kwon wrote: > +1 > > 2021년 2월 16일 (화) 오후 5:10, Prashant Sharma 님이 작성: > >> +1 >> >> On Tue, Feb 16, 2021 at 1:22 PM Dongjoon Hyun >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 3.0.2. >>> >>> The vote is

Re: [VOTE] Standardize Spark Exception Messages SPIP

2020-11-09 Thread Herman van Hovell
+1 On Mon, Nov 9, 2020 at 2:06 AM Takeshi Yamamuro wrote: > +1 > > On Thu, Nov 5, 2020 at 3:41 AM Xinyi Yu wrote: > >> Hi all, >> >> We had the discussion of SPIP: Standardize Spark Exception Messages at >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Sp

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Herman van Hovell
Congratulations! On Wed, Jul 15, 2020 at 9:00 AM angers.zhu wrote: > Congratulations ! > > angers.zhu > angers@gmail.com > >