Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Dongjoon Hyun
Technically, there is no agreement here. In other words, we have the same situation with the initial discussion thread where we couldn't build a community consensus on this. > I will consider this as "lazy consensus" if there are no objections > for 3 days from initiation of the thread. If you ne

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Jungtaek Lim
Let's not start with VOTE right now, but let me make clear about options and pros/cons for the option, so that people can choose one over another. Option 1 (Current proposal): retain migration logic for Spark 4.0 (and maybe more minor versions, up to decision) which contains the problematic config

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Wenchen Fan
Shall we open an official vote for it? We can put more details on it so that people can vote: 1. how does it break user workloads without this migration code? 2. what is the Apache policy for leaked vendor names in the codebase? I think this is not the only one, we also mentioned `com.databricks.sp

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Jungtaek Lim
One major question: How do you believe that we can enforce users on upgrading path? I have seen a bunch of cases where users upgrade 2-3 minor versions at once. Do you really believe we can just break their query? What's the data backing up your claim? I think we agree to disagree. I really don't

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-04 Thread Jungtaek Lim
Bumping on this. Again, this is a blocker for Spark 4.0.0. I will consider this as "lazy consensus" if there are no objections for 3 days from initiation of the thread. On Tue, Mar 4, 2025 at 2:15 PM Jungtaek Lim wrote: > Hi dev, > > This is a spin-up of the original thread "Deprecating and bann

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Jungtaek Lim
Thanks for the input. I hear the concern and I don't have strong initiative to debate. Let's punt this to 4.1. We'd need to document that TWS is not supported in Spark Connect. I'll figure out with the team where we could document this properly. Dismissing the thread. On Wed, Mar 5, 2025 at 1:24

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Holden Karau
I share the same concern, adding new features at this stage feels risky and likely to drag out an already fairly late release. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning S

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Jungtaek Lim
Thanks for the input. Perhaps I have already put every word I can use to persuade the case, so I'll consider this as -1 as I assume you've read through it. I argue that this is not a random backport (this is one of top tracked projects in Spark 4.0), but of course I hear the concern for any reason

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Dongjoon Hyun
Thank you for initiating this. BTW, RC failures are irrelevant to the new feature backporting request. So, in principle, I'm -1 for this late arrival because this could be a bad example which opens the door to all random backporting and delays. However, I'll follow a broader community consensus

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Jungtaek Lim
Thank you for understanding. Actually I'm dealing with a blocker for Spark 4.0.0 (so RC will always fail till I address this), you may want to join the discussion to unblock me. https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr For sure, we will work with Wenchen to get the final si

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Mridul Muralidharan
Hi Jungtaek, It is fairly irregular to make feature updates this late, but given that RC2 appears to have failed - you should be getting a sign off from the release manager in particular; whose life will be made difficult with this :-) I dont have strong objections if RM is fine absorbing the lo

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Mich Talebzadeh
Sure we leave it as it is. No big deal Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue, 4 Mar 2025 at 23:29, Jungtaek Lim wrote: > Thanks for catching thi

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Jungtaek Lim
Thanks for catching this (unfortunately this is already checked in, so changing the PR description doesn't reflect the commit). But I'd say, let's focus on the discussion and let's not try to nitpick. If we are really concerned, I can modify the commit message when we decide to push this to Spark

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-04 Thread Jules Damji
Disregard the last message. I neglected to set SPARK_REMOTE to get pyspark to work correctly. Cheers Jules > On Mar 4, 2025, at 2:24 PM, Chris Nauroth wrote: > > -1 (non-binding) > > I think I found some missing license information in the binary distribution. > We may want to include this

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-04 Thread Jules Damji
- 1 (non-binding) A ran into number of installation and launching problems. May be it’s my enviornment, even though I removed any old binaries and packages. 1. Pip installing pyspark4.0.0 and pyspark-connect-4.0 from .tz file workedl, launching pyspark results into 25/03/04 14:00:26 ERROR Spa

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-04 Thread Chris Nauroth
-1 (non-binding) I think I found some missing license information in the binary distribution. We may want to include this in the next RC: https://github.com/apache/spark/pull/50158 Thank you for putting together this RC, Wenchen. Chris Nauroth On Mon, Mar 3, 2025 at 6:10 AM Wenchen Fan wrote

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Mich Talebzadeh
Thanks, I read PySpark pull. I suggest this Why are the changes needed? As Spark connect is becoming the default *API *in spark 4.0, we need to add connect support for TWS in Python. Why: Saying this "As Spark Connect is becoming* the default AP*I in Spark 4.0" reflects more accurately that Spar

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Jungtaek Lim
Hi, Here are PRs we are seeking for consensus to get in for 4.0. PySpark: https://github.com/apache/spark/pull/49560 Scala: https://github.com/apache/spark/pull/49488 Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Mar 4, 2025 at 11:06 PM Mich Talebzadeh wrote: > Thanks. > > Can you point to a li

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Jungtaek Lim
Hi Mridul, I'd like to persuade you if your concern is just that it's a bit late, because of the following: 1. The change only introduces a parity with Spark Connect, hence low risk and don't have a chance to break other stuff. If it breaks, it only breaks TWS + Spark Connect combination. For re

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Mridul Muralidharan
Hi Jungtaek, We are already in RC2 for 4.0, right ? A bit too late for this IMO - we can always introduce it in 4.1 Regards, Mridul On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell wrote: > +1 > > On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar > wrote: > >> +1 - Would be great to get t

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Mich Talebzadeh
Thanks. Can you point to a link or any further documentation please? Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue, 4 Mar 2025 at 13:22, Herman van Hovel

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Herman van Hovell
+1 On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar wrote: > +1 - Would be great to get this into the Spark 4.0 release. > > Thanks, > Anish > > On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim > wrote: > >> Hi dev, >> >> We are going to introduce a new API named `transformWithState` for >> streami