Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Jungtaek Lim Sat, 15 Mar 2025 09:51:27 -0700

small missing on link:

4. I claimed I wanted to proceed with migration logic for branch-4.0 PR,
and hadn't got any feedback except being told to wait for Spark 3.5.5 (link
<https://github.com/apache/spark/pull/49983#pullrequestreview-2621947671>).
If you weren't open to my proposal, you should have just said "we were
already decided" and you had to give the evidence. I haven't heard any, so
I had to initiate DISCUSS.




On Sat, Mar 15, 2025 at 11:18 PM Jungtaek Lim <[email protected]>
wrote:

> > according to the ASF process, the Apache Spark community made the
> conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code
> with the improved Spark 4.0 migration guide because I provided a technical
> justification for my vote via the concrete alternative based on the
> existing Spark 3.5.5, AS-IS code base, and the suggested better migration
> guide way in order to eliminate the affected streaming queries.
>
> I can always be corrected if you give the evidence. Let's stop "just"
> talking. I believe we are seeing quite different things and our memory is
> quite opposite. "History will tell us."
>
> I am trying to understand where the miscommunication came from. Some
> clarification:
>
> 1. I believe I have said I do not agree just removing the config in
> master/4.0 and I expected follow-up, which is the migration logic. (link
> <https://github.com/apache/spark/pull/49897#issuecomment-2652486115>) I
> admit this is a bit unclear to understand, but I had multiple times to make
> my voice clear, otherwise I shouldn't ever have migration logic PR for
> master/4.0.
> 2. I believe I have said my intention is to land the migration logic to
> 4.0.x and arguably longer (link
> <https://lists.apache.org/thread/q24vonqhvqh11ghd488rctsm89zvmpqd>).
> I think there were people who were wanting to remove the vendor name in
> any way, but arguably it just ended with an open question, never to be
> reached consensus. People expressed concerns, but nothing was concluded
> except we agree with proceeding for Spark 3.5.5. We never made a consensus
> on how to deal with it in Spark 4.0.0+ in that discussion thread,
> especially about migration logic.
> 3. VOTE for removal of config is clearly stated that it is only 3.5. (link
> <https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5>)
> 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR,
> and hadn't got any feedback except being told to wait for Spark 3.5.5
> (link). If you weren't open to my proposal, you should have just said "we
> were already decided" and you had to give the evidence. I haven't heard
> any, so I had to initiate DISCUSS.
> 5. We all know about DISCUSS and VOTE so I wouldn't repeat.
>
> I have strong evidence that you were aware of the fact we never agreed
> with the behavior for Spark 4.0.0, and you said my proposal is "technically
> correct", so we had never debated about "technical objection", but debated
> about "behavior".
> https://github.com/apache/spark/pull/49983#issuecomment-2676531485
>
> Can you please explain why you said my proposal is "technically correct"
> and here you did a vote which required "technical objection"? Have you
> changed your mind?
>
> Overall, when you say "the Apache Spark community made the conclusion to
> unblock the Apache Spark 4.0.0 release with the AS-IS code", I don't get
> who is "the Apache Spark community". Where can I see the DISCUSS and VOTE
> thread? Is it really that I am excluded on the list of the Apache Spark
> community, while arguably I am the only active maintainer of the module?
> Could you please enumerate who the Apache Spark community was at that time?
>
> Let's not talk based on memory. If we agree about that, we should have a
> history. I am open to apologize if I missed a critical discussion and vote.
> Your (and my) memory should never be used as evidence. Please, give the
> evidence.
>
> I'm also happy to hear about the other thread I have made. Thanks.
>
> On Sat, Mar 15, 2025 at 9:23 AM Dongjoon Hyun <[email protected]>
> wrote:
>
>> Apache Spark PMC always strongly recommends all 3.5 users to upgrade to
>> the latest stable release via the official website. The main question seems
>> quite different from the Apache Spark website. May I ask what is not safe
>> to guide Spark 3.5.4 users to 3.5.5, Jungtaek?
>>
>> > The main question was, "where is the evidence it's safe to force users
>> to upgrade to Spark 3.5.5...
>>
>> For the following part, when the Apache Spark community made a mistake at
>> Spark 2.4.2 release, we guided the users to upgrade to 2.4.3 immediately
>> after recovering the default Scala version to 2.11.
>>
>> > to upgrade to Spark 3.5.5 before upgrading to Spark 4.0.0".
>>
>> 2019-04-23 https://spark.apache.org/releases/spark-release-2-4-2.html
>> 2019-05-08 https://spark.apache.org/releases/spark-release-2-4-3.html
>>
>> In the same way, Apache Spark 3.5.5 was released and is ready to handle a
>> mistake at Spark 3.5.4.
>>
>> 2025-02-27 https://spark.apache.org/releases/spark-release-3-5-5.html
>>
>> For the vote, the vote is a time-limited procedure to make a swift
>> decision. That's the reason why you proposed the vote procedure and we
>> agreed. There is no way to `block` the votes. The vote itself is already
>> completed (including my -1).
>>
>> > you weren’t intended to “block” the vote
>>
>> I've been considering this as a part of the whole `spark.databricks.*`
>> incident handling. In my interpretation, according to the ASF process, the
>> Apache Spark community made the conclusion to unblock the Apache Spark
>> 4.0.0 release with the AS-IS code with the improved Spark 4.0 migration
>> guide because I provided a technical justification for my vote via the
>> concrete alternative based on the existing Spark 3.5.5, AS-IS code base,
>> and the suggested better migration guide way in order to eliminate the
>> affected streaming queries.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>>
>> On Fri, Mar 14, 2025 at 3:23 AM Jungtaek Lim <
>> [email protected]> wrote:
>>
>>> That said, if I understand correctly, you weren’t intended to “block”
>>> the vote, right? You say you expected the vote to be finished.
>>>
>>> Could you please cast the vote to -0.x since some people views this as
>>> code change vote, or clarify explicitly that you think this is not a code
>>> change vote? This will help resolve the concerns from some PMC members
>>> about how we should interpret the vote result clearly.
>>>
>>> Thanks!
>>>
>>> 2025년 3월 14일 (금) 오후 5:33, Dongjoon Hyun <[email protected]>님이 작성:
>>>
>>>> Thank you all.
>>>>
>>>> The vote is finished in an intended way with the expected result. We
>>>> have enough time to discuss and I have been sticking to my original
>>>> technical justification from the beginning (including this).
>>>>
>>>> 1. Helping renaming the conf via SPARK-51172 (by approving it)
>>>> 2. Banning `spark.databricks.*` via SPARK-51173 (by adding `configName`
>>>> Scalastyle rule)
>>>> 3. Led the discussion thread and reached the agreement to release Spark
>>>> 3.5.5 early.
>>>> 4. Releasing 3.5.5 as a release manager to provide a candidate
>>>> migration path
>>>> 5. Proposing to use the migration path
>>>>
>>>> This vote was Step 5. My technical point has always been aiming to
>>>> recover the Apache Spark 4 codebase to the status before our mistake by
>>>> containing the issue only in `branch-3.5` and providing the proposed narrow
>>>> migration path. And, as mentioned already, that's the situation where we
>>>> were during the vote at Apache Spark AS-IS branches. What all of us agree
>>>> on is that the previous code base is okay. I didn't reply to
>>>> Jungtaek's Apple comment intentionally because it's not a public
>>>> Spark-vendor like Databricks. And, it's a product name of the popular
>>>> consumer electronic devices like Intel/AMD/Graviton. In addition, I don't
>>>> think we are going to add back `spark.databricks.*` because of the reason
>>>> the customers ask for it. In the same way, this vote is one of the
>>>> political decision making processes of Apache Spark PMC. We started this
>>>> vote because we couldn't make a consensus.
>>>>
>>>> I believe I've been providing all my best to the Apache Spark community
>>>> by actions and with valid technical clarification (without no modification
>>>> during the process).
>>>>
>>>> Sincerely,
>>>> Dongjoon
>>>>
>>>>
>>>> On Thu, Mar 13, 2025 at 11:41 PM Mridul Muralidharan <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> FWIW, I am +1 on the proposal (though I missed the vote on this !)
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>> On Fri, Mar 14, 2025 at 1:31 AM Mridul Muralidharan <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>   I agree with Mark, imo this is a qualified veto.
>>>>>> We should give Dongjoon the opportunity to give his clarification, if
>>>>>> any.
>>>>>>
>>>>>> I do realize this delays the RC process, but this deserves to be
>>>>>> looked into carefully.
>>>>>>
>>>>>> Thanks,
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 13, 2025 at 9:35 PM Mark Hamstra <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Absolutely not!
>>>>>>>
>>>>>>> This is clearly a vote on a code change, not on a procedural issue or
>>>>>>> a package release. The code change has been vetoed by a -1 vote by a
>>>>>>> qualified voter.
>>>>>>>
>>>>>>> On Thu, Mar 13, 2025 at 6:58 PM Jungtaek Lim
>>>>>>> <[email protected]> wrote:
>>>>>>> >
>>>>>>> > Likewise I said, I'm concluding the VOTE since we ensure the
>>>>>>> criteria (3 +1 binding, 1 -1 binding, and also +1s from non-binding).
>>>>>>> >
>>>>>>> > I don't consider -1 as a veto as I explained, as we should have
>>>>>>> multiple -1s if we go for VOTE with the current codebase. (+1 in this
>>>>>>> proposal is effectively -1 in another proposal.)
>>>>>>> >
>>>>>>> > The vote followed the Apache Voting Process with the type of
>>>>>>> "package release" (which we tend to use in dev@ for VOTE). I guess
>>>>>>> it could have also done with "procedural issues" which is less strict, 
>>>>>>> but
>>>>>>> then this fulfills both types of votes which should be OK.
>>>>>>> >
>>>>>>> > The current codebase is "accidentally" representing another
>>>>>>> proposal and it is never intended. I don't find the way I can -1 to the
>>>>>>> current codebase, and make a different change neither bound to any 
>>>>>>> proposal
>>>>>>> to be fair.
>>>>>>> >
>>>>>>> > I don't want to block the release because of the above. So, let's
>>>>>>> change the current codebase the way we discussed and voted here. 
>>>>>>> Reverting
>>>>>>> this decision should require another VOTE.
>>>>>>> >
>>>>>>> > Thanks to everyone who voted!
>>>>>>> >
>>>>>>> > On Thu, Mar 13, 2025 at 4:54 PM Jungtaek Lim <
>>>>>>> [email protected]> wrote:
>>>>>>> >>
>>>>>>> >> Thanks to everyone who participated and voted!
>>>>>>> >>
>>>>>>> >> Now I can technically conclude the VOTE, but I'm willing to wait
>>>>>>> till US daytime tomorrow, to give some time for Dongjoon to revisit 
>>>>>>> this.
>>>>>>> >>
>>>>>>> >> I'll conclude the vote around 6PM PST tomorrow regardless of his
>>>>>>> vote. It's ideal to see us have no -1, but having one -1 doesn't block 
>>>>>>> this
>>>>>>> vote and we can move forward.
>>>>>>> >>
>>>>>>> >> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <[email protected]>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> forgot to mention in my last reply, my stance is +1
>>>>>>> >>>
>>>>>>> >>> Jie Yang
>>>>>>> >>>
>>>>>>> >>> On 2025/03/13 07:08:12 Russell Jurney wrote:
>>>>>>> >>> > Sure, +1 non-binding.
>>>>>>> >>> >
>>>>>>> >>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim <
>>>>>>> [email protected]>
>>>>>>> >>> > wrote:
>>>>>>> >>> >
>>>>>>> >>> > > Russell,
>>>>>>> >>> > >
>>>>>>> >>> > > Of course, we hear people' voices who aren't having binding
>>>>>>> votes as well.
>>>>>>> >>> > > Personally I think it's more important than committers/PMC
>>>>>>> members'  VOTE
>>>>>>> >>> > > this time since we can be biased and be far from user
>>>>>>> experience.
>>>>>>> >>> > >
>>>>>>> >>> > > Could you please explicitly cast your vote, like +1
>>>>>>> (non-binding)? You
>>>>>>> >>> > > seem to agree with the proposal. Thanks!
>>>>>>> >>> > >
>>>>>>> >>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney <
>>>>>>> [email protected]>
>>>>>>> >>> > > wrote:
>>>>>>> >>> > >
>>>>>>> >>> > >> I'm just a lurker and aspiring contributor, but as a Spark
>>>>>>> user upgrading
>>>>>>> >>> > >> twice is very confusing and would cause many or most users
>>>>>>> to fail to
>>>>>>> >>> > >> upgrade successfully to Spark 4 on a first go. That seems
>>>>>>> like a very bad
>>>>>>> >>> > >> user experience. I thought it was worthwhile stating this
>>>>>>> out loud.
>>>>>>> >>> > >>
>>>>>>> >>> > >> Russell
>>>>>>> >>> > >>
>>>>>>> >>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <
>>>>>>> [email protected]> wrote:
>>>>>>> >>> > >>
>>>>>>> >>> > >>> this vote is to allow streaming queries which had been
>>>>>>> ever run in Spark
>>>>>>> >>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to
>>>>>>> be upgraded with
>>>>>>> >>> > >>>> Spark 3.5.5+ in prior".
>>>>>>> >>> > >>>
>>>>>>> >>> > >>>
>>>>>>> >>> > >>> In the history of Apache Spark, have we ever required
>>>>>>> users to upgrade
>>>>>>> >>> > >>> to the next maintenance release before moving to a new
>>>>>>> feature or major
>>>>>>> >>> > >>> release?
>>>>>>> >>> > >>>
>>>>>>> >>> > >>> Xiao
>>>>>>> >>> > >>>
>>>>>>> >>> > >>> Adam Binford <[email protected]> 于2025年3月11日周二 09:08写道：
>>>>>>> >>> > >>>
>>>>>>> >>> > >>>> +1 (non-binding)
>>>>>>> >>> > >>>>
>>>>>>> >>> > >>>> It's a pretty in the weeds issue with how Structured
>>>>>>> Streaming works
>>>>>>> >>> > >>>> under the hood that's kinda hard to understand if you're
>>>>>>> not familiar with
>>>>>>> >>> > >>>> it. The migration logic doesn't mean users can still use
>>>>>>> the old config,
>>>>>>> >>> > >>>> it's purely behind the scenes to fix checkpoint metadata
>>>>>>> in streams created
>>>>>>> >>> > >>>> in 3.5.4. The 5 lines of code it takes to address a weird
>>>>>>> edge case for
>>>>>>> >>> > >>>> certain users that's already gone from master shouldn't
>>>>>>> be a huge deal.
>>>>>>> >>> > >>>>
>>>>>>> >>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <
>>>>>>> [email protected]> wrote:
>>>>>>> >>> > >>>>
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>> To Sean, you're right, I'm very sorry.
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>> From the perspective of compatibility and migratability,
>>>>>>> I think we
>>>>>>> >>> > >>>>> should migrate this logic to 4.0.0 and keep it in the
>>>>>>> codebase for a longer
>>>>>>> >>> > >>>>> time (or permanently), because we can't predict which
>>>>>>> version users of
>>>>>>> >>> > >>>>> 3.5.4 will choose next.
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>> I don't want to discuss the so-called vendor issue.
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>> I withdraw my previous -1.
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>> Jie Yang.
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote:
>>>>>>> >>> > >>>>> > Guys, let’s be honest about what we’re discussing here.
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> > If this is a migration issue, why would we even need a
>>>>>>> vote? We’ve
>>>>>>> >>> > >>>>> been
>>>>>>> >>> > >>>>> > consistently adding configurations to restore legacy
>>>>>>> behavior
>>>>>>> >>> > >>>>> instead of
>>>>>>> >>> > >>>>> > removing them because we understand the challenges of
>>>>>>> upgrading Spark
>>>>>>> >>> > >>>>> > versions. Our goal has always been to make upgrades
>>>>>>> easier, even if
>>>>>>> >>> > >>>>> it
>>>>>>> >>> > >>>>> > means carrying some technical debt. I don’t think we
>>>>>>> want to change
>>>>>>> >>> > >>>>> that
>>>>>>> >>> > >>>>> > culture now.
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> > If the concern is about vendor names appearing in the
>>>>>>> codebase, then
>>>>>>> >>> > >>>>> why is
>>>>>>> >>> > >>>>> > it a big deal this time when vendor names are already
>>>>>>> present
>>>>>>> >>> > >>>>> elsewhere? If
>>>>>>> >>> > >>>>> > we’ve failed to follow a policy, let’s correct it, but
>>>>>>> can someone
>>>>>>> >>> > >>>>> point to
>>>>>>> >>> > >>>>> > the specific policy we’re violating?
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> > If the vote is about adding migration logic to ease
>>>>>>> the upgrade from
>>>>>>> >>> > >>>>> 3.5.4
>>>>>>> >>> > >>>>> > to 4.0.0, then +1, why not?
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> > Thanks,
>>>>>>> >>> > >>>>> > Wenchen
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim <
>>>>>>> >>> > >>>>> [email protected]>
>>>>>>> >>> > >>>>> > wrote:
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>> > > Well said, Sean. Sorry I made you keep around here
>>>>>>> since it might
>>>>>>> >>> > >>>>> not be
>>>>>>> >>> > >>>>> > > clearly stated. My bad.
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > > Yang, how could we ever tolerate the fact there are
>>>>>>> "other"
>>>>>>> >>> > >>>>> occurrences of
>>>>>>> >>> > >>>>> > > vendor names in the codebase? Please go and search
>>>>>>> "databricks" in
>>>>>>> >>> > >>>>> the
>>>>>>> >>> > >>>>> > > codebase and be surprised.
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > > If we believe that having vendor names in the
>>>>>>> codebase will
>>>>>>> >>> > >>>>> increase
>>>>>>> >>> > >>>>> > > the occurrence of making mistakes, why didn't we
>>>>>>> have a discussion
>>>>>>> >>> > >>>>> thread
>>>>>>> >>> > >>>>> > > earlier to remove all occurrences altogether? This
>>>>>>> is super tricky
>>>>>>> >>> > >>>>> because
>>>>>>> >>> > >>>>> > > I can even start to argue we have "Apple" as a
>>>>>>> vendor name in
>>>>>>> >>> > >>>>> Apache Spark
>>>>>>> >>> > >>>>> > > codebase. I'm not saying we use "apple" in the test
>>>>>>> data. See
>>>>>>> >>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable?
>>>>>>> No,
>>>>>>> >>> > >>>>> `isMacOnMSeries` or
>>>>>>> >>> > >>>>> > > `isMacOnSilicon` is enough.
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > > We really need to draw a line where we disallow
>>>>>>> vendor names on it
>>>>>>> >>> > >>>>> - if
>>>>>>> >>> > >>>>> > > it's the entire codebase, I don't really think it is
>>>>>>> realistic.
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > > This was really a mistake, and it was definitely not
>>>>>>> from
>>>>>>> >>> > >>>>> referring to the
>>>>>>> >>> > >>>>> > > existing codebase. Not having a vendor name does not
>>>>>>> change
>>>>>>> >>> > >>>>> anything on the
>>>>>>> >>> > >>>>> > > chance of encountering this issue again. If we
>>>>>>> really care, we
>>>>>>> >>> > >>>>> should think
>>>>>>> >>> > >>>>> > > about style checking, which is the only viable way
>>>>>>> to catch the
>>>>>>> >>> > >>>>> mistake.
>>>>>>> >>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor
>>>>>>> names in that
>>>>>>> >>> > >>>>> style
>>>>>>> >>> > >>>>> > > check, not just the problematic vendor name.
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen <
>>>>>>> [email protected]>
>>>>>>> >>> > >>>>> wrote:
>>>>>>> >>> > >>>>> > >
>>>>>>> >>> > >>>>> > >> Doesn't the migration code 'clear' the debt?
>>>>>>> >>> > >>>>> > >> The proposal is not to continue to support the
>>>>>>> config.
>>>>>>> >>> > >>>>> > >> I feel like people are not quite understanding the
>>>>>>> change, and
>>>>>>> >>> > >>>>> objecting
>>>>>>> >>> > >>>>> > >> to something that doesn't exist.
>>>>>>> >>> > >>>>> > >> It's a shame, as this seems like something not even
>>>>>>> worth
>>>>>>> >>> > >>>>> discussing. I
>>>>>>> >>> > >>>>> > >> don't know why this triggered this much discussion.
>>>>>>> We have kept
>>>>>>> >>> > >>>>> deprecated
>>>>>>> >>> > >>>>> > >> methods without blinking, which is in comparison
>>>>>>> much bigger.
>>>>>>> >>> > >>>>> > >> Can we maybe ask you review the actual change in
>>>>>>> question?
>>>>>>> >>> > >>>>> > >>
>>>>>>> >>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <
>>>>>>> [email protected]>
>>>>>>> >>> > >>>>> wrote:
>>>>>>> >>> > >>>>> > >>
>>>>>>> >>> > >>>>> > >>> -1
>>>>>>> >>> > >>>>> > >>> Remove migration logic of incorrect
>>>>>>> `spark.databricks.*`
>>>>>>> >>> > >>>>> configuration
>>>>>>> >>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration
>>>>>>> was initially
>>>>>>> >>> > >>>>> introduced
>>>>>>> >>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear
>>>>>>> design intent.
>>>>>>> >>> > >>>>> Although the
>>>>>>> >>> > >>>>> > >>> immediate maintenance cost of retaining this
>>>>>>> configuration
>>>>>>> >>> > >>>>> currently seems
>>>>>>> >>> > >>>>> > >>> limited, as subsequent versions iterate and user
>>>>>>> habits form, it
>>>>>>> >>> > >>>>> may lead
>>>>>>> >>> > >>>>> > >>> to the continuous accumulation of technical debt.
>>>>>>> When users
>>>>>>> >>> > >>>>> come to view
>>>>>>> >>> > >>>>> > >>> this configuration as one that can be relied on
>>>>>>> long-term,
>>>>>>> >>> > >>>>> future removal
>>>>>>> >>> > >>>>> > >>> may face greater resistance from users and could
>>>>>>> potentially
>>>>>>> >>> > >>>>> become an
>>>>>>> >>> > >>>>> > >>> entrenched and redundant configuration in the
>>>>>>> codebase.
>>>>>>> >>> > >>>>> Therefore, promptly
>>>>>>> >>> > >>>>> > >>> correcting this historically accidental
>>>>>>> configuration not only
>>>>>>> >>> > >>>>> maintains
>>>>>>> >>> > >>>>> > >>> the normativity of the Spark configuration system
>>>>>>> but also
>>>>>>> >>> > >>>>> prevents
>>>>>>> >>> > >>>>> > >>> unintended configurations from becoming de facto
>>>>>>> standards,
>>>>>>> >>> > >>>>> thereby
>>>>>>> >>> > >>>>> > >>> reducing long-term maintenance risks.
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>> > >>> Jie Yang
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote:
>>>>>>> >>> > >>>>> > >>> > -1 because there exists a feasible migration
>>>>>>> path for Apache
>>>>>>> >>> > >>>>> Spark
>>>>>>> >>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5.
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> > It's obvious that this Databricks' mistake
>>>>>>> already causes a
>>>>>>> >>> > >>>>> huge
>>>>>>> >>> > >>>>> > >>> communication cost in the Apache Spark community
>>>>>>> and is
>>>>>>> >>> > >>>>> suggesting a burden
>>>>>>> >>> > >>>>> > >>> to enforce us to handle at least two more PRs at
>>>>>>> 4.0.0 and 4.1.0.
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> > Given that, I don't think
>>>>>>> >>> > >>>>> > >>> > - This is an inevitable or
>>>>>>> >>> > >>>>> > >>> > - This is 0 cost
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> > Dongjoon.
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote:
>>>>>>> >>> > >>>>> > >>> > > Starting from my +1 (non-binding).
>>>>>>> >>> > >>>>> > >>> > >
>>>>>>> >>> > >>>>> > >>> > > In addition, I propose to retain migration
>>>>>>> logic till Spark
>>>>>>> >>> > >>>>> 4.1.x and
>>>>>>> >>> > >>>>> > >>> > > remove it in Spark 4.2.0.
>>>>>>> >>> > >>>>> > >>> > >
>>>>>>> >>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim <
>>>>>>> >>> > >>>>> > >>> [email protected]>
>>>>>>> >>> > >>>>> > >>> > > wrote:
>>>>>>> >>> > >>>>> > >>> > >
>>>>>>> >>> > >>>>> > >>> > > > Hi dev,
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > Please vote to retain migration logic of
>>>>>>> incorrect
>>>>>>> >>> > >>>>> > >>> `spark.databricks.*`
>>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x.
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > - DISCUSSION:
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>>
>>>>>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
>>>>>>> >>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.*
>>>>>>> config being
>>>>>>> >>> > >>>>> exposed in
>>>>>>> >>> > >>>>> > >>> 3.5.4 in
>>>>>>> >>> > >>>>> > >>> > > > Spark 4.0.0+)
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > Specifically, please review this post
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>>
>>>>>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k
>>>>>>> >>> > >>>>> > >>> which
>>>>>>> >>> > >>>>> > >>> > > > explains pros and cons about the proposal -
>>>>>>> proposal is
>>>>>>> >>> > >>>>> about
>>>>>>> >>> > >>>>> > >>> "Option 1".
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > Simply speaking, this vote is to allow
>>>>>>> streaming queries
>>>>>>> >>> > >>>>> which had
>>>>>>> >>> > >>>>> > >>> been
>>>>>>> >>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with
>>>>>>> Spark 4.0.x,
>>>>>>> >>> > >>>>> "without
>>>>>>> >>> > >>>>> > >>> having to
>>>>>>> >>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If
>>>>>>> the vote
>>>>>>> >>> > >>>>> passes, we
>>>>>>> >>> > >>>>> > >>> will help
>>>>>>> >>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark
>>>>>>> 3.5.4 to Spark
>>>>>>> >>> > >>>>> 4.0.x,
>>>>>>> >>> > >>>>> > >>> which would
>>>>>>> >>> > >>>>> > >>> > > > be almost 1 year.
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > The (only) cons in this option is having to
>>>>>>> retain the
>>>>>>> >>> > >>>>> incorrect
>>>>>>> >>> > >>>>> > >>> > > > configuration name as "string" in the
>>>>>>> codebase a bit
>>>>>>> >>> > >>>>> longer. The
>>>>>>> >>> > >>>>> > >>> code
>>>>>>> >>> > >>>>> > >>> > > > complexity of migration logic is arguably
>>>>>>> trivial. (link
>>>>>>> >>> > >>>>> > >>> > > > <
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>>
>>>>>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> > > > )
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone
>>>>>>> supports
>>>>>>> >>> > >>>>> including
>>>>>>> >>> > >>>>> > >>> migration
>>>>>>> >>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please
>>>>>>> cast +1 here
>>>>>>> >>> > >>>>> and leave
>>>>>>> >>> > >>>>> > >>> the
>>>>>>> >>> > >>>>> > >>> > > > desired last minor version of Spark to
>>>>>>> retain this
>>>>>>> >>> > >>>>> migration logic.
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > The vote is open for the next 72 hours and
>>>>>>> passes if a
>>>>>>> >>> > >>>>> majority +1
>>>>>>> >>> > >>>>> > >>> PMC
>>>>>>> >>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes.
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect
>>>>>>> >>> > >>>>> `spark.databricks.*`
>>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x
>>>>>>> >>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect
>>>>>>> >>> > >>>>> `spark.databricks.*`
>>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because...
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > > > Thanks!
>>>>>>> >>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR)
>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>> >>> > >>>>> > >>> > >
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >>> > >>>>> > >>> > To unsubscribe e-mail:
>>>>>>> [email protected]
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>> >
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >>> > >>>>> > >>> To unsubscribe e-mail:
>>>>>>> [email protected]
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>> > >>>
>>>>>>> >>> > >>>>> >
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >>> > >>>>> To unsubscribe e-mail: [email protected]
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>>
>>>>>>> >>> > >>>>
>>>>>>> >>> > >>>> --
>>>>>>> >>> > >>>> Adam Binford
>>>>>>> >>> > >>>>
>>>>>>> >>> > >>>
>>>>>>> >>> >
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >>> To unsubscribe e-mail: [email protected]
>>>>>>> >>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>
>>>>>>>

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Reply via email to