[VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread dongjoon.hyun
Please vote to deprecate `spark.databricks.*` configuration at Apache Spark 3.5.5. This is a part of the following on-going discussion. - DISCUSSION: https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0ndfw2jd (Deprecating and banning `spark.databricks.*` config from Apache Spark repository

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Mich Talebzadeh
+1 Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Wed, 19 Feb 2025 at 06:51, Ángel wrote: > +1 (non-binding) > > El mié, 19 feb 2025, 7:43, Wenchen Fan escr

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Ángel
+1 (non-binding) El mié, 19 feb 2025, 7:43, Wenchen Fan escribió: > +1 > > On Wed, Feb 19, 2025 at 2:36 PM Sakthi wrote: > >> +1 (non-binding) >> >> On Tue, Feb 18, 2025 at 10:21 PM Yang Jie wrote: >> >>> +1 >>> >>> On 2025/02/19 05:57:53 Mark Hamstra wrote: >>> > +1 >>> > >>> > On Tue, Feb 18

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Wenchen Fan
+1 On Wed, Feb 19, 2025 at 2:36 PM Sakthi wrote: > +1 (non-binding) > > On Tue, Feb 18, 2025 at 10:21 PM Yang Jie wrote: > >> +1 >> >> On 2025/02/19 05:57:53 Mark Hamstra wrote: >> > +1 >> > >> > On Tue, Feb 18, 2025 at 9:46 PM dongjoon.hyun >> wrote: >> > > >> > > Please vote to deprecate `sp

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Sakthi
+1 (non-binding) On Tue, Feb 18, 2025 at 10:21 PM Yang Jie wrote: > +1 > > On 2025/02/19 05:57:53 Mark Hamstra wrote: > > +1 > > > > On Tue, Feb 18, 2025 at 9:46 PM dongjoon.hyun > wrote: > > > > > > Please vote to deprecate `spark.databricks.*` configuration at Apache > Spark 3.5.5. > > > This

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Mark Hamstra
This doesn't really have anything to do with a broader approach to breaking changes. Removing the mistake in 4.0.0 does not change our striving to avoid breaking APIs or silently changing behavior -- striving is not a guarantee. And the addition of check-in tooling should prevent the issue from rec

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Yang Jie
+1 On 2025/02/19 05:57:53 Mark Hamstra wrote: > +1 > > On Tue, Feb 18, 2025 at 9:46 PM dongjoon.hyun wrote: > > > > Please vote to deprecate `spark.databricks.*` configuration at Apache Spark > > 3.5.5. > > This is a part of the following on-going discussion. > > > > - DISCUSSION: > > https://

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
Hi Mark, If I understand correctly, we are introducing a breaking change in 4.0 by removing configs because it is necessary. I’m not suggesting that we are violating the rule, just ensuring that there is consensus on this being a necessary breaking change, which it seems there is. And yes, this is

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Mark Hamstra
+1 On Tue, Feb 18, 2025 at 9:46 PM dongjoon.hyun wrote: > > Please vote to deprecate `spark.databricks.*` configuration at Apache Spark > 3.5.5. > This is a part of the following on-going discussion. > > - DISCUSSION: https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0ndfw2jd > (Deprecat

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Dongjoon Hyun
+1 If the vote passed, I'm going to roll the Apache Spark 3.5.5 RC1 next Monday after verifying the deprecation. Dongjoon. On 2025/02/19 05:45:31 "dongjoon.hyun" wrote: > Please vote to deprecate `spark.databricks.*` configuration at Apache Spark > 3.5.5. > This is a part of the following on-go

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
Hi Dongjoon, If this is a policy issue that necessitates a breaking change, then sure, let’s proceed. I don’t have a strong opinion on this specific case, but I’m more concerned with the broader approach to breaking changes. I’m referencing this statement from the Spark Version Policy

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Dongjoon Hyun
Thank you for your opinion, Bjorn, Jungtaek, Wenchen, Holden, Mich, and Mark. At least, I believe we agree that we should provide a way to mitigate Apache Spark 3.5.4 issue ASAP. To make a real community action in order to prevent the further spread of `spark.databricks.*` configuration by Spar

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Dongjoon Hyun
I have different perspectives from Wenchen's opinion in three ways. > I’d like to emphasize that a major version release is not a justification > for unnecessary breaking changes. > ...the period between 3.5.5 and 4.0.0 likely isn’t long enough. First, it's an inevitably necessary change to prot

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Mark Hamstra
The issue is not how many lines of code it is, but rather how serious of an issue it is to have the databricks namespace in Apache code. It's not a large functional issue, but that doesn't mean that it is only a minor issue, nor do I think that I would characterize the removal of this clear error a

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Wenchen Fan
Hi all, I’d like to emphasize that a major version release is not a justification for unnecessary breaking changes. If we are confident that no one is using this configuration, we should clean it up in 3.5.5 as well. However, if there’s a possibility that users are already relying on it, then lega

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Mich Talebzadeh
Depends how you want to play this. As usual a cost/benefit analysis will be useful *Immediate Removal in Spark 3.5.5*: pros: Quickly removes the problematic configuration, reducing technical debt and potential issues. cons: Users upgrading directly from earlier versions to Spark 3.5.5 or later wil

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Jungtaek Lim
Though if we are OK with disturbing users to read the migration guide to figure out the change for the case of direct upgrade to Spark 4.0.0+, I agree this is also one of the valid ways. On Wed, Feb 19, 2025 at 9:20 AM Jungtaek Lim wrote: > The point is, why can't we remove it from Spark 3.5.5 a

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Jungtaek Lim
The point is, why can't we remove it from Spark 3.5.5 as well if we are planning to "remove" (not deprecate) at the very next minor release? The logic of migration just works without having the incorrect config key to be indicated with SQL config key. That said, the point we debate here is only v

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Holden Karau
I think that removing in 4 sounds reasonable to me as well. It’s important to create a sense of fairness among vendors. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, H

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Dongjoon Hyun
I don't think there is a reason to keep it at 4.0.0 (and forever?) if we release Spark 3.5.5 with the proper deprecation. This is a big difference, Wenchen. And, the difference is the main reason why I initiated this thread to sugguest to remove 'spark.databricks.*' completely from Apache Spark 4