Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Nicholas Chammas Mon, 10 Mar 2025 20:57:44 -0700

I agree with Sean that this proposal does not seem to me as controversial as it 
has turned out so far.


Jungtaek’s detailed breakdown on the other thread 
<https://lists.apache.org/thread/zlhgr1mx0q520odvpnmnzwd8mp9x6bpl> explains 
that this proposed change is mainly to benefit open source users of Apache 
Spark and give them a way to directly upgrade from Apache Spark 3.5.4 to 4.0.0, 
as opposed to forcing them to upgrade first to 3.5.5 before then being able to 
upgrade to 4.0.0.

Jungtaek’s proposal is essentially a convenience to open source users. These 
users may or may not be using a vendor distribution of Spark. It does not 
benefit or harm Databricks or any other vendor. And it adds a very small 
maintenance burden on contributors.

Isn’t this a tradeoff we should generally make? Help users upgrade at a minor 
maintenance cost.

+1


> On Mar 10, 2025, at 11:16 PM, Sean Owen <[email protected]> wrote:
> 
> Doesn't the migration code 'clear' the debt?
> The proposal is not to continue to support the config.
> I feel like people are not quite understanding the change, and objecting to 
> something that doesn't exist.
> It's a shame, as this seems like something not even worth discussing. I don't 
> know why this triggered this much discussion. We have kept deprecated methods 
> without blinking, which is in comparison much bigger.
> Can we maybe ask you review the actual change in question?
> 
> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <[email protected] 
> <mailto:[email protected]>> wrote:
>> -1
>> Remove migration logic of incorrect `spark.databricks.*` configuration in 
>> Spark 4.0.0 because I think this configuration was initially introduced 
>> accidentally in Spark 3.5.4, lacking a clear design intent. Although the 
>> immediate maintenance cost of retaining this configuration currently seems 
>> limited, as subsequent versions iterate and user habits form, it may lead to 
>> the continuous accumulation of technical debt. When users come to view this 
>> configuration as one that can be relied on long-term, future removal may 
>> face greater resistance from users and could potentially become an 
>> entrenched and redundant configuration in the codebase. Therefore, promptly 
>> correcting this historically accidental configuration not only maintains the 
>> normativity of the Spark configuration system but also prevents unintended 
>> configurations from becoming de facto standards, thereby reducing long-term 
>> maintenance risks.
>> 
>> Jie Yang
>> 
>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote:
>> > -1 because there exists a feasible migration path for Apache Spark 3.5.4 
>> > via Apache Spark 3.5.5. 
>> > 
>> > It's obvious that this Databricks' mistake already causes a huge 
>> > communication cost in the Apache Spark community and is suggesting a 
>> > burden to enforce us to handle at least two more PRs at 4.0.0 and 4.1.0.
>> > 
>> > Given that, I don't think
>> > - This is an inevitable or
>> > - This is 0 cost 
>> > 
>> > Dongjoon.
>> > 
>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote:
>> > > Starting from my +1 (non-binding).
>> > > 
>> > > In addition, I propose to retain migration logic till Spark 4.1.x and
>> > > remove it in Spark 4.2.0.
>> > > 
>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim 
>> > > <[email protected] <mailto:[email protected]>>
>> > > wrote:
>> > > 
>> > > > Hi dev,
>> > > >
>> > > > Please vote to retain migration logic of incorrect `spark.databricks.*`
>> > > > configuration in Spark 4.0.x.
>> > > >
>> > > > - DISCUSSION:
>> > > > https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
>> > > > ([DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in
>> > > > Spark 4.0.0+)
>> > > >
>> > > > Specifically, please review this post
>> > > > https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k which
>> > > > explains pros and cons about the proposal - proposal is about "Option 
>> > > > 1".
>> > > >
>> > > > Simply speaking, this vote is to allow streaming queries which had been
>> > > > ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, "without 
>> > > > having to
>> > > > be upgraded with Spark 3.5.5+ in prior". If the vote passes, we will 
>> > > > help
>> > > > users to have a smooth upgrade from Spark 3.5.4 to Spark 4.0.x, which 
>> > > > would
>> > > > be almost 1 year.
>> > > >
>> > > > The (only) cons in this option is having to retain the incorrect
>> > > > configuration name as "string" in the codebase a bit longer. The code
>> > > > complexity of migration logic is arguably trivial. (link
>> > > > <https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183>
>> > > > )
>> > > >
>> > > > This VOTE is for Spark 4.0.x, but if someone supports including 
>> > > > migration
>> > > > logic to be longer than Spark 4.0.x, please cast +1 here and leave the
>> > > > desired last minor version of Spark to retain this migration logic.
>> > > >
>> > > > The vote is open for the next 72 hours and passes if a majority +1 PMC
>> > > > votes are cast, with a minimum of 3 +1 votes.
>> > > >
>> > > > [ ] +1 Retain migration logic of incorrect `spark.databricks.*`
>> > > > configuration in Spark 4.0.x
>> > > > [ ] -1 Remove migration logic of incorrect `spark.databricks.*`
>> > > > configuration in Spark 4.0.0 because...
>> > > >
>> > > > Thanks!
>> > > > Jungtaek Lim (HeartSaVioR)
>> > > >
>> > > 
>> > 
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: [email protected] 
>> > <mailto:[email protected]>
>> > 
>> > 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected] 
>> <mailto:[email protected]>
>>

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Reply via email to