Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Yang Jie
To Sean, you're right, I'm very sorry. >From the perspective of compatibility and migratability, I think we should >migrate this logic to 4.0.0 and keep it in the codebase for a longer time (or >permanently), because we can't predict which version users of 3.5.4 will >choose next. I don't

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Andrew Melo
Hi Jungtaek, I've read the discussion, which is why I replied with my questions (which you neglected to answer). Your deflection and lack of response to direct questions should be (IMO) disqualifying. So, again: To put it into less complicated words - presumably the people using the databricks.*

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Wenchen Fan
Guys, let’s be honest about what we’re discussing here. If this is a migration issue, why would we even need a vote? We’ve been consistently adding configurations to restore legacy behavior instead of removing them because we understand the challenges of upgrading Spark versions. Our goal has alwa

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Jules Damji
+ 1 (non-binding) Generally speaking, it’s a good idea to separate repositories for all Spark Connect clients under Spark. - better organization - better visibility - easier for contribution - better for growth & extension of Spark Connect ecosystem Cheers Jules — Sent from my iPhone Pardon the

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-10 Thread Hyukjin Kwon
If we should fix, let's make sure we don't just disable the tests - we will create another set of technical debt. On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad wrote: > I’ll look into the JIRA. Please assign it to me. > > Thank you, > > Vlad > > > On Feb 26, 2025, at 11:33 PM, Yang Jie wrote: > >

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Nicholas Chammas
> On Mar 10, 2025, at 10:14 PM, Andrew Melo wrote: >> >> This config was released to "Apache" Spark 3.5.4, so this is NO LONGER just >> a problem with vendor distribution. The breakage will happen even if someone >> does not even know about Databricks Runtime at all and keeps using Apache >>

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Nicholas Chammas
I agree with Sean that this proposal does not seem to me as controversial as it has turned out so far. Jungtaek’s detailed breakdown on the other thread explains that this proposed change is mainly to benefit open source users o

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Jungtaek Lim
Well said, Sean. Sorry I made you keep around here since it might not be clearly stated. My bad. Yang, how could we ever tolerate the fact there are "other" occurrences of vendor names in the codebase? Please go and search "databricks" in the codebase and be surprised. If we believe that having v

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread huaxin gao
+1 On Mon, Mar 10, 2025 at 10:04 AM Denny Lee wrote: > +1 (non-binding) > > On Mon, Mar 10, 2025 at 9:47 AM Peter Toth wrote: > >> +1 >> >> On Mon, Mar 10, 2025 at 5:39 PM Kent Yao wrote: >> >>> +1 >>> >>> Kent >>> >>> 在 2025年3月10日星期一,Max Gekk 写道: >>> +1 On Mon, Mar 10, 2025 at

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Jungtaek Lim
Thanks for looking into the issue in depth. What you described is right. I also understand the concern why we keep the buggy behavior, but the QO issue is quite complicated and the most concerning part is that it's "selective". So if the query runs with QO's decision in "one way" in its lifecycle,

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Sean Owen
Doesn't the migration code 'clear' the debt? The proposal is not to continue to support the config. I feel like people are not quite understanding the change, and objecting to something that doesn't exist. It's a shame, as this seems like something not even worth discussing. I don't know why this t

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Yang Jie
-1 Remove migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.0 because I think this configuration was initially introduced accidentally in Spark 3.5.4, lacking a clear design intent. Although the immediate maintenance cost of retaining this configuration currently seems

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Andrew Melo
Hi Jungtaek below On Mon, Mar 10, 2025 at 9:02 PM Jungtaek Lim wrote: > > Replied inline > > On Tue, Mar 11, 2025 at 10:39 AM Andrew Melo wrote: >> >> Hi Jungtaek, >> >> I've read the discussion, which is why I replied with my questions >> (which you neglected to answer). Your deflection and la

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-10 Thread Bobby
> I ran into an exception issue when playing around spark connect, more details can be found at https://issues.apache.org/jira/browse/SPARK-51451 > pyspark.errors.exceptions.connect.AnalysisException: [UNSUPPORTED_GENERATOR.NESTED_IN_EXPRESSIONS] The generator is not supported: nested in expressio

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Jules Damji
+ 1 (non-binding) Generally speaking, it’s a good idea to separate repositories for all Spark Connect clients under Spark. - better organization - better visibility - easier for contribution - better for growth & extension of Spark Connect ecosystem Cheers Jules — Sent from my iPhone Pardon t

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Adam Binford
I was very confused about this as well but I think I understand it more after reading through the PRs. Jungtaek let me know if this is correct, maybe it will help others understand. There was a bug where streaming queries could prune parts of the query that might have side effects, like stateful q

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Jungtaek Lim
Replied inline On Tue, Mar 11, 2025 at 10:39 AM Andrew Melo wrote: > Hi Jungtaek, > > I've read the discussion, which is why I replied with my questions > (which you neglected to answer). Your deflection and lack of response > to direct questions should be (IMO) disqualifying. So, again: > > To

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Jungtaek Lim
Please read through the explanation of how this impacts the OSS users in the other branch of this discussion. This happened in "Apache" Spark 3.5.4, and the migration logic has nothing to do with the vendor. This is primarily to not break users in "Apache" Spark 3.5.4 who are willing to upgrade dir

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Andrew Melo
Hello Jungtaek, I'm not implying that this improves the vendors life. I'm just not understanding the issue -- the downstream people started a stream with a config option that the upstream people don't want to carry. If the affected users are using the downstream fork (which is how they got the opt

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Mich Talebzadeh
Glad to see that eventually this repository is created now Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Mon, 10 Mar 2025 at 23:37, Dongjoon Hyun wrote: > T

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-10 Thread Jungtaek Lim
+1 to Hyukjin. If the test is effective, we should definitely retain the effectiveness of the test, unless we end up with the conclusion that there is no way to do that. On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon wrote: > If we should fix, let's make sure we don't just disable the tests - we >

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread DB Tsai
+1 It's exciting to see Apple developing a Spark Connect Swift client, showcasing Spark Connect as a truly language-agnostic protocol between the client and the Spark driver. Swift, known for its power and intuitiveness across iOS, iPadOS, macOS, tvOS, and watchOS, is expanding its role as a

Re: [PR] [SPARK-51458] Add GitHub Action job to check ASF license [spark-connect-swift]

2025-03-10 Thread via GitHub
dongjoon-hyun commented on PR #2: URL: https://github.com/apache/spark-connect-swift/pull/2#issuecomment-2712117717 Could you review this PR, @HyukjinKwon ? It seems that the GitHub Action is not triggered on this PR because this is the first PR. -- This is an automated message fr

[PR] [SPARK-51458] Add GitHub Action job to check ASF license [spark-connect-swift]

2025-03-10 Thread via GitHub
dongjoon-hyun opened a new pull request, #2: URL: https://github.com/apache/spark-connect-swift/pull/2 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[PR] Initial Implementation [spark-connect-swift]

2025-03-10 Thread via GitHub
dongjoon-hyun opened a new pull request, #1: URL: https://github.com/apache/spark-connect-swift/pull/1 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Jungtaek Lim
+1 (non-binding) Great to see the expansion of the Spark Connect ecosystem! On Tue, Mar 11, 2025 at 6:41 AM Rozov, Vlad wrote: > +1 (non-binding) > > Thank you, > > Vlad > > On Mar 9, 2025, at 3:30 PM, Dongjoon Hyun wrote: > > > Hi, All. > > I'd like to propose to add a new Apache Spark reposi

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Jungtaek Lim
One thing I can correct immediately is, downstream does not have any impact at all from this. I believe I clarified that the config will not be modified by anyone, so downstream there is nothing to change. The problem is particular in OSS, downstream does not have any issue with this leak at all. (

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Andrew Melo
Hello all As an outsider, I don't fully understand this discussion. This particular configuration option "leaked" into the open-source Spark distribution, and now there is a lot of discussion about how to mitigate existing workloads. But: presumably the people who are depending on this configurati

Re: [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+

2025-03-10 Thread Adam Binford
As someone who has a lot of streams that have been restarted with 3.5.4, I would prefer not to have to restart everything with 3.5.5 but it's definitely doable. But my question is what is the actual behavior if the migration logic was removed? From a quick glance it seems like the incorrect config

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad On Mar 9, 2025, at 3:30 PM, Dongjoon Hyun wrote: Hi, All. I'd like to propose to add a new Apache Spark repository for `Spark Connect Client for Swift` in Apache Spark 4.1.0 timeframe. https://github.com/apache/spark-connect-swift To do this, I created an u

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Dongjoon Hyun
Thank you, Liang-Chi, Wenchen, Max, Kent, Peter, Denny, Huaxin! Dongjoon. On 2025/03/10 17:14:27 huaxin gao wrote: > +1 > > On Mon, Mar 10, 2025 at 10:04 AM Denny Lee wrote: > > > +1 (non-binding) > > > > On Mon, Mar 10, 2025 at 9:47 AM Peter Toth wrote: > > > >> +1 > >> > >> On Mon, Mar 10,

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Yang Jie
Great! Really happy to see that spark-connect supports more programming languages. On 2025/03/10 07:00:32 Martin Grund wrote: > Great work and proposal! > > I'm supportive. > > On Sun, Mar 9, 2025 at 23:31 Dongjoon Hyun wrote: > > > Hi, All. > > > > I'd like to propose to add a new Apache Sp

[VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-10 Thread Jungtaek Lim
Hi dev, Please vote to retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x. - DISCUSSION: https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr ([DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+) Specifically, please

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-10 Thread Martin Grund
Great work and proposal! I'm supportive. On Sun, Mar 9, 2025 at 23:31 Dongjoon Hyun wrote: > Hi, All. > > I'd like to propose to add a new Apache Spark repository for `Spark > Connect Client for Swift` in Apache Spark 4.1.0 timeframe. > > https://github.com/apache/spark-connect-swift > > To do