Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Jungtaek Lim Sat, 15 Mar 2025 07:28:12 -0700

Dongjoon, it is your responsibility to clarify your vote position
since the vote is stalled as some people still claim your vote is veto. If
you are really agreeing that I gained the consensus in the proper way, and
your vote is really just for historical record, let's not waste more time
by explicitly casting -0.99.


If you say I misunderstood your reply and you are still casting a veto, I'm
happy to hear the evidence based on the history. We only talked from Github
PR and mailing list, so none of the discussion happened except that
infrastructure. I do not count any discussion happening in private@, as
private@ is not meant to be used for discussion which could have been done
in public.

On Sat, Mar 15, 2025 at 11:21 PM Jungtaek Lim <[email protected]>
wrote:

> small missing on link:
>
> 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR,
> and hadn't got any feedback except being told to wait for Spark 3.5.5 (
> link
> <https://github.com/apache/spark/pull/49983#pullrequestreview-2621947671>).
> If you weren't open to my proposal, you should have just said "we were
> already decided" and you had to give the evidence. I haven't heard any, so
> I had to initiate DISCUSS.
>
>
>
> On Sat, Mar 15, 2025 at 11:18 PM Jungtaek Lim <
> [email protected]> wrote:
>
>> > according to the ASF process, the Apache Spark community made the
>> conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code
>> with the improved Spark 4.0 migration guide because I provided a technical
>> justification for my vote via the concrete alternative based on the
>> existing Spark 3.5.5, AS-IS code base, and the suggested better migration
>> guide way in order to eliminate the affected streaming queries.
>>
>> I can always be corrected if you give the evidence. Let's stop "just"
>> talking. I believe we are seeing quite different things and our memory is
>> quite opposite. "History will tell us."
>>
>> I am trying to understand where the miscommunication came from. Some
>> clarification:
>>
>> 1. I believe I have said I do not agree just removing the config in
>> master/4.0 and I expected follow-up, which is the migration logic. (link
>> <https://github.com/apache/spark/pull/49897#issuecomment-2652486115>) I
>> admit this is a bit unclear to understand, but I had multiple times to make
>> my voice clear, otherwise I shouldn't ever have migration logic PR for
>> master/4.0.
>> 2. I believe I have said my intention is to land the migration logic to
>> 4.0.x and arguably longer (link
>> <https://lists.apache.org/thread/q24vonqhvqh11ghd488rctsm89zvmpqd>).
>> I think there were people who were wanting to remove the vendor name in
>> any way, but arguably it just ended with an open question, never to be
>> reached consensus. People expressed concerns, but nothing was concluded
>> except we agree with proceeding for Spark 3.5.5. We never made a consensus
>> on how to deal with it in Spark 4.0.0+ in that discussion thread,
>> especially about migration logic.
>> 3. VOTE for removal of config is clearly stated that it is only 3.5. (
>> link <https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5>)
>> 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR,
>> and hadn't got any feedback except being told to wait for Spark 3.5.5
>> (link). If you weren't open to my proposal, you should have just said "we
>> were already decided" and you had to give the evidence. I haven't heard
>> any, so I had to initiate DISCUSS.
>> 5. We all know about DISCUSS and VOTE so I wouldn't repeat.
>>
>> I have strong evidence that you were aware of the fact we never agreed
>> with the behavior for Spark 4.0.0, and you said my proposal is "technically
>> correct", so we had never debated about "technical objection", but debated
>> about "behavior".
>> https://github.com/apache/spark/pull/49983#issuecomment-2676531485
>>
>> Can you please explain why you said my proposal is "technically correct"
>> and here you did a vote which required "technical objection"? Have you
>> changed your mind?
>>
>> Overall, when you say "the Apache Spark community made the conclusion to
>> unblock the Apache Spark 4.0.0 release with the AS-IS code", I don't get
>> who is "the Apache Spark community". Where can I see the DISCUSS and VOTE
>> thread? Is it really that I am excluded on the list of the Apache Spark
>> community, while arguably I am the only active maintainer of the module?
>> Could you please enumerate who the Apache Spark community was at that time?
>>
>> Let's not talk based on memory. If we agree about that, we should have a
>> history. I am open to apologize if I missed a critical discussion and vote.
>> Your (and my) memory should never be used as evidence. Please, give the
>> evidence.
>>
>> I'm also happy to hear about the other thread I have made. Thanks.
>>
>> On Sat, Mar 15, 2025 at 9:23 AM Dongjoon Hyun <[email protected]>
>> wrote:
>>
>>> Apache Spark PMC always strongly recommends all 3.5 users to upgrade to
>>> the latest stable release via the official website. The main question seems
>>> quite different from the Apache Spark website. May I ask what is not safe
>>> to guide Spark 3.5.4 users to 3.5.5, Jungtaek?
>>>
>>> > The main question was, "where is the evidence it's safe to force users
>>> to upgrade to Spark 3.5.5...
>>>
>>> For the following part, when the Apache Spark community made a mistake
>>> at Spark 2.4.2 release, we guided the users to upgrade to 2.4.3 immediately
>>> after recovering the default Scala version to 2.11.
>>>
>>> > to upgrade to Spark 3.5.5 before upgrading to Spark 4.0.0".
>>>
>>> 2019-04-23 https://spark.apache.org/releases/spark-release-2-4-2.html
>>> 2019-05-08 https://spark.apache.org/releases/spark-release-2-4-3.html
>>>
>>> In the same way, Apache Spark 3.5.5 was released and is ready to handle
>>> a mistake at Spark 3.5.4.
>>>
>>> 2025-02-27 https://spark.apache.org/releases/spark-release-3-5-5.html
>>>
>>> For the vote, the vote is a time-limited procedure to make a swift
>>> decision. That's the reason why you proposed the vote procedure and we
>>> agreed. There is no way to `block` the votes. The vote itself is already
>>> completed (including my -1).
>>>
>>> > you weren’t intended to “block” the vote
>>>
>>> I've been considering this as a part of the whole `spark.databricks.*`
>>> incident handling. In my interpretation, according to the ASF process, the
>>> Apache Spark community made the conclusion to unblock the Apache Spark
>>> 4.0.0 release with the AS-IS code with the improved Spark 4.0 migration
>>> guide because I provided a technical justification for my vote via the
>>> concrete alternative based on the existing Spark 3.5.5, AS-IS code base,
>>> and the suggested better migration guide way in order to eliminate the
>>> affected streaming queries.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>>
>>> On Fri, Mar 14, 2025 at 3:23 AM Jungtaek Lim <
>>> [email protected]> wrote:
>>>
>>>> That said, if I understand correctly, you weren’t intended to “block”
>>>> the vote, right? You say you expected the vote to be finished.
>>>>
>>>> Could you please cast the vote to -0.x since some people views this as
>>>> code change vote, or clarify explicitly that you think this is not a code
>>>> change vote? This will help resolve the concerns from some PMC members
>>>> about how we should interpret the vote result clearly.
>>>>
>>>> Thanks!
>>>>
>>>> 2025년 3월 14일 (금) 오후 5:33, Dongjoon Hyun <[email protected]>님이 작성:
>>>>
>>>>> Thank you all.
>>>>>
>>>>> The vote is finished in an intended way with the expected result. We
>>>>> have enough time to discuss and I have been sticking to my original
>>>>> technical justification from the beginning (including this).
>>>>>
>>>>> 1. Helping renaming the conf via SPARK-51172 (by approving it)
>>>>> 2. Banning `spark.databricks.*` via SPARK-51173 (by adding
>>>>> `configName` Scalastyle rule)
>>>>> 3. Led the discussion thread and reached the agreement to release
>>>>> Spark 3.5.5 early.
>>>>> 4. Releasing 3.5.5 as a release manager to provide a candidate
>>>>> migration path
>>>>> 5. Proposing to use the migration path
>>>>>
>>>>> This vote was Step 5. My technical point has always been aiming to
>>>>> recover the Apache Spark 4 codebase to the status before our mistake by
>>>>> containing the issue only in `branch-3.5` and providing the proposed 
>>>>> narrow
>>>>> migration path. And, as mentioned already, that's the situation where we
>>>>> were during the vote at Apache Spark AS-IS branches. What all of us agree
>>>>> on is that the previous code base is okay. I didn't reply to
>>>>> Jungtaek's Apple comment intentionally because it's not a public
>>>>> Spark-vendor like Databricks. And, it's a product name of the popular
>>>>> consumer electronic devices like Intel/AMD/Graviton. In addition, I don't
>>>>> think we are going to add back `spark.databricks.*` because of the reason
>>>>> the customers ask for it. In the same way, this vote is one of the
>>>>> political decision making processes of Apache Spark PMC. We started this
>>>>> vote because we couldn't make a consensus.
>>>>>
>>>>> I believe I've been providing all my best to the Apache Spark
>>>>> community by actions and with valid technical clarification (without no
>>>>> modification during the process).
>>>>>
>>>>> Sincerely,
>>>>> Dongjoon
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2025 at 11:41 PM Mridul Muralidharan <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> FWIW, I am +1 on the proposal (though I missed the vote on this !)
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>> On Fri, Mar 14, 2025 at 1:31 AM Mridul Muralidharan <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>   I agree with Mark, imo this is a qualified veto.
>>>>>>> We should give Dongjoon the opportunity to give his clarification,
>>>>>>> if any.
>>>>>>>
>>>>>>> I do realize this delays the RC process, but this deserves to be
>>>>>>> looked into carefully.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mridul
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 13, 2025 at 9:35 PM Mark Hamstra <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Absolutely not!
>>>>>>>>
>>>>>>>> This is clearly a vote on a code change, not on a procedural issue
>>>>>>>> or
>>>>>>>> a package release. The code change has been vetoed by a -1 vote by a
>>>>>>>> qualified voter.
>>>>>>>>
>>>>>>>> On Thu, Mar 13, 2025 at 6:58 PM Jungtaek Lim
>>>>>>>> <[email protected]> wrote:
>>>>>>>> >
>>>>>>>> > Likewise I said, I'm concluding the VOTE since we ensure the
>>>>>>>> criteria (3 +1 binding, 1 -1 binding, and also +1s from non-binding).
>>>>>>>> >
>>>>>>>> > I don't consider -1 as a veto as I explained, as we should have
>>>>>>>> multiple -1s if we go for VOTE with the current codebase. (+1 in this
>>>>>>>> proposal is effectively -1 in another proposal.)
>>>>>>>> >
>>>>>>>> > The vote followed the Apache Voting Process with the type of
>>>>>>>> "package release" (which we tend to use in dev@ for VOTE). I guess
>>>>>>>> it could have also done with "procedural issues" which is less strict, 
>>>>>>>> but
>>>>>>>> then this fulfills both types of votes which should be OK.
>>>>>>>> >
>>>>>>>> > The current codebase is "accidentally" representing another
>>>>>>>> proposal and it is never intended. I don't find the way I can -1 to the
>>>>>>>> current codebase, and make a different change neither bound to any 
>>>>>>>> proposal
>>>>>>>> to be fair.
>>>>>>>> >
>>>>>>>> > I don't want to block the release because of the above. So, let's
>>>>>>>> change the current codebase the way we discussed and voted here. 
>>>>>>>> Reverting
>>>>>>>> this decision should require another VOTE.
>>>>>>>> >
>>>>>>>> > Thanks to everyone who voted!
>>>>>>>> >
>>>>>>>> > On Thu, Mar 13, 2025 at 4:54 PM Jungtaek Lim <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >>
>>>>>>>> >> Thanks to everyone who participated and voted!
>>>>>>>> >>
>>>>>>>> >> Now I can technically conclude the VOTE, but I'm willing to wait
>>>>>>>> till US daytime tomorrow, to give some time for Dongjoon to revisit 
>>>>>>>> this.
>>>>>>>> >>
>>>>>>>> >> I'll conclude the vote around 6PM PST tomorrow regardless of his
>>>>>>>> vote. It's ideal to see us have no -1, but having one -1 doesn't block 
>>>>>>>> this
>>>>>>>> vote and we can move forward.
>>>>>>>> >>
>>>>>>>> >> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>> forgot to mention in my last reply, my stance is +1
>>>>>>>> >>>
>>>>>>>> >>> Jie Yang
>>>>>>>> >>>
>>>>>>>> >>> On 2025/03/13 07:08:12 Russell Jurney wrote:
>>>>>>>> >>> > Sure, +1 non-binding.
>>>>>>>> >>> >
>>>>>>>> >>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim <
>>>>>>>> [email protected]>
>>>>>>>> >>> > wrote:
>>>>>>>> >>> >
>>>>>>>> >>> > > Russell,
>>>>>>>> >>> > >
>>>>>>>> >>> > > Of course, we hear people' voices who aren't having binding
>>>>>>>> votes as well.
>>>>>>>> >>> > > Personally I think it's more important than committers/PMC
>>>>>>>> members'  VOTE
>>>>>>>> >>> > > this time since we can be biased and be far from user
>>>>>>>> experience.
>>>>>>>> >>> > >
>>>>>>>> >>> > > Could you please explicitly cast your vote, like +1
>>>>>>>> (non-binding)? You
>>>>>>>> >>> > > seem to agree with the proposal. Thanks!
>>>>>>>> >>> > >
>>>>>>>> >>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney <
>>>>>>>> [email protected]>
>>>>>>>> >>> > > wrote:
>>>>>>>> >>> > >
>>>>>>>> >>> > >> I'm just a lurker and aspiring contributor, but as a Spark
>>>>>>>> user upgrading
>>>>>>>> >>> > >> twice is very confusing and would cause many or most users
>>>>>>>> to fail to
>>>>>>>> >>> > >> upgrade successfully to Spark 4 on a first go. That seems
>>>>>>>> like a very bad
>>>>>>>> >>> > >> user experience. I thought it was worthwhile stating this
>>>>>>>> out loud.
>>>>>>>> >>> > >>
>>>>>>>> >>> > >> Russell
>>>>>>>> >>> > >>
>>>>>>>> >>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >>> > >>
>>>>>>>> >>> > >>> this vote is to allow streaming queries which had been
>>>>>>>> ever run in Spark
>>>>>>>> >>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having
>>>>>>>> to be upgraded with
>>>>>>>> >>> > >>>> Spark 3.5.5+ in prior".
>>>>>>>> >>> > >>>
>>>>>>>> >>> > >>>
>>>>>>>> >>> > >>> In the history of Apache Spark, have we ever required
>>>>>>>> users to upgrade
>>>>>>>> >>> > >>> to the next maintenance release before moving to a new
>>>>>>>> feature or major
>>>>>>>> >>> > >>> release?
>>>>>>>> >>> > >>>
>>>>>>>> >>> > >>> Xiao
>>>>>>>> >>> > >>>
>>>>>>>> >>> > >>> Adam Binford <[email protected]> 于2025年3月11日周二 09:08写道：
>>>>>>>> >>> > >>>
>>>>>>>> >>> > >>>> +1 (non-binding)
>>>>>>>> >>> > >>>>
>>>>>>>> >>> > >>>> It's a pretty in the weeds issue with how Structured
>>>>>>>> Streaming works
>>>>>>>> >>> > >>>> under the hood that's kinda hard to understand if you're
>>>>>>>> not familiar with
>>>>>>>> >>> > >>>> it. The migration logic doesn't mean users can still use
>>>>>>>> the old config,
>>>>>>>> >>> > >>>> it's purely behind the scenes to fix checkpoint metadata
>>>>>>>> in streams created
>>>>>>>> >>> > >>>> in 3.5.4. The 5 lines of code it takes to address a
>>>>>>>> weird edge case for
>>>>>>>> >>> > >>>> certain users that's already gone from master shouldn't
>>>>>>>> be a huge deal.
>>>>>>>> >>> > >>>>
>>>>>>>> >>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >>> > >>>>
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>> To Sean, you're right, I'm very sorry.
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>> From the perspective of compatibility and
>>>>>>>> migratability, I think we
>>>>>>>> >>> > >>>>> should migrate this logic to 4.0.0 and keep it in the
>>>>>>>> codebase for a longer
>>>>>>>> >>> > >>>>> time (or permanently), because we can't predict which
>>>>>>>> version users of
>>>>>>>> >>> > >>>>> 3.5.4 will choose next.
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>> I don't want to discuss the so-called vendor issue.
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>> I withdraw my previous -1.
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>> Jie Yang.
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote:
>>>>>>>> >>> > >>>>> > Guys, let’s be honest about what we’re discussing
>>>>>>>> here.
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> > If this is a migration issue, why would we even need
>>>>>>>> a vote? We’ve
>>>>>>>> >>> > >>>>> been
>>>>>>>> >>> > >>>>> > consistently adding configurations to restore legacy
>>>>>>>> behavior
>>>>>>>> >>> > >>>>> instead of
>>>>>>>> >>> > >>>>> > removing them because we understand the challenges of
>>>>>>>> upgrading Spark
>>>>>>>> >>> > >>>>> > versions. Our goal has always been to make upgrades
>>>>>>>> easier, even if
>>>>>>>> >>> > >>>>> it
>>>>>>>> >>> > >>>>> > means carrying some technical debt. I don’t think we
>>>>>>>> want to change
>>>>>>>> >>> > >>>>> that
>>>>>>>> >>> > >>>>> > culture now.
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> > If the concern is about vendor names appearing in the
>>>>>>>> codebase, then
>>>>>>>> >>> > >>>>> why is
>>>>>>>> >>> > >>>>> > it a big deal this time when vendor names are already
>>>>>>>> present
>>>>>>>> >>> > >>>>> elsewhere? If
>>>>>>>> >>> > >>>>> > we’ve failed to follow a policy, let’s correct it,
>>>>>>>> but can someone
>>>>>>>> >>> > >>>>> point to
>>>>>>>> >>> > >>>>> > the specific policy we’re violating?
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> > If the vote is about adding migration logic to ease
>>>>>>>> the upgrade from
>>>>>>>> >>> > >>>>> 3.5.4
>>>>>>>> >>> > >>>>> > to 4.0.0, then +1, why not?
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> > Thanks,
>>>>>>>> >>> > >>>>> > Wenchen
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim <
>>>>>>>> >>> > >>>>> [email protected]>
>>>>>>>> >>> > >>>>> > wrote:
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>> > > Well said, Sean. Sorry I made you keep around here
>>>>>>>> since it might
>>>>>>>> >>> > >>>>> not be
>>>>>>>> >>> > >>>>> > > clearly stated. My bad.
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > > Yang, how could we ever tolerate the fact there are
>>>>>>>> "other"
>>>>>>>> >>> > >>>>> occurrences of
>>>>>>>> >>> > >>>>> > > vendor names in the codebase? Please go and search
>>>>>>>> "databricks" in
>>>>>>>> >>> > >>>>> the
>>>>>>>> >>> > >>>>> > > codebase and be surprised.
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > > If we believe that having vendor names in the
>>>>>>>> codebase will
>>>>>>>> >>> > >>>>> increase
>>>>>>>> >>> > >>>>> > > the occurrence of making mistakes, why didn't we
>>>>>>>> have a discussion
>>>>>>>> >>> > >>>>> thread
>>>>>>>> >>> > >>>>> > > earlier to remove all occurrences altogether? This
>>>>>>>> is super tricky
>>>>>>>> >>> > >>>>> because
>>>>>>>> >>> > >>>>> > > I can even start to argue we have "Apple" as a
>>>>>>>> vendor name in
>>>>>>>> >>> > >>>>> Apache Spark
>>>>>>>> >>> > >>>>> > > codebase. I'm not saying we use "apple" in the test
>>>>>>>> data. See
>>>>>>>> >>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable?
>>>>>>>> No,
>>>>>>>> >>> > >>>>> `isMacOnMSeries` or
>>>>>>>> >>> > >>>>> > > `isMacOnSilicon` is enough.
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > > We really need to draw a line where we disallow
>>>>>>>> vendor names on it
>>>>>>>> >>> > >>>>> - if
>>>>>>>> >>> > >>>>> > > it's the entire codebase, I don't really think it
>>>>>>>> is realistic.
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > > This was really a mistake, and it was definitely
>>>>>>>> not from
>>>>>>>> >>> > >>>>> referring to the
>>>>>>>> >>> > >>>>> > > existing codebase. Not having a vendor name does
>>>>>>>> not change
>>>>>>>> >>> > >>>>> anything on the
>>>>>>>> >>> > >>>>> > > chance of encountering this issue again. If we
>>>>>>>> really care, we
>>>>>>>> >>> > >>>>> should think
>>>>>>>> >>> > >>>>> > > about style checking, which is the only viable way
>>>>>>>> to catch the
>>>>>>>> >>> > >>>>> mistake.
>>>>>>>> >>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor
>>>>>>>> names in that
>>>>>>>> >>> > >>>>> style
>>>>>>>> >>> > >>>>> > > check, not just the problematic vendor name.
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen <
>>>>>>>> [email protected]>
>>>>>>>> >>> > >>>>> wrote:
>>>>>>>> >>> > >>>>> > >
>>>>>>>> >>> > >>>>> > >> Doesn't the migration code 'clear' the debt?
>>>>>>>> >>> > >>>>> > >> The proposal is not to continue to support the
>>>>>>>> config.
>>>>>>>> >>> > >>>>> > >> I feel like people are not quite understanding the
>>>>>>>> change, and
>>>>>>>> >>> > >>>>> objecting
>>>>>>>> >>> > >>>>> > >> to something that doesn't exist.
>>>>>>>> >>> > >>>>> > >> It's a shame, as this seems like something not
>>>>>>>> even worth
>>>>>>>> >>> > >>>>> discussing. I
>>>>>>>> >>> > >>>>> > >> don't know why this triggered this much
>>>>>>>> discussion. We have kept
>>>>>>>> >>> > >>>>> deprecated
>>>>>>>> >>> > >>>>> > >> methods without blinking, which is in comparison
>>>>>>>> much bigger.
>>>>>>>> >>> > >>>>> > >> Can we maybe ask you review the actual change in
>>>>>>>> question?
>>>>>>>> >>> > >>>>> > >>
>>>>>>>> >>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <
>>>>>>>> [email protected]>
>>>>>>>> >>> > >>>>> wrote:
>>>>>>>> >>> > >>>>> > >>
>>>>>>>> >>> > >>>>> > >>> -1
>>>>>>>> >>> > >>>>> > >>> Remove migration logic of incorrect
>>>>>>>> `spark.databricks.*`
>>>>>>>> >>> > >>>>> configuration
>>>>>>>> >>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration
>>>>>>>> was initially
>>>>>>>> >>> > >>>>> introduced
>>>>>>>> >>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear
>>>>>>>> design intent.
>>>>>>>> >>> > >>>>> Although the
>>>>>>>> >>> > >>>>> > >>> immediate maintenance cost of retaining this
>>>>>>>> configuration
>>>>>>>> >>> > >>>>> currently seems
>>>>>>>> >>> > >>>>> > >>> limited, as subsequent versions iterate and user
>>>>>>>> habits form, it
>>>>>>>> >>> > >>>>> may lead
>>>>>>>> >>> > >>>>> > >>> to the continuous accumulation of technical debt.
>>>>>>>> When users
>>>>>>>> >>> > >>>>> come to view
>>>>>>>> >>> > >>>>> > >>> this configuration as one that can be relied on
>>>>>>>> long-term,
>>>>>>>> >>> > >>>>> future removal
>>>>>>>> >>> > >>>>> > >>> may face greater resistance from users and could
>>>>>>>> potentially
>>>>>>>> >>> > >>>>> become an
>>>>>>>> >>> > >>>>> > >>> entrenched and redundant configuration in the
>>>>>>>> codebase.
>>>>>>>> >>> > >>>>> Therefore, promptly
>>>>>>>> >>> > >>>>> > >>> correcting this historically accidental
>>>>>>>> configuration not only
>>>>>>>> >>> > >>>>> maintains
>>>>>>>> >>> > >>>>> > >>> the normativity of the Spark configuration system
>>>>>>>> but also
>>>>>>>> >>> > >>>>> prevents
>>>>>>>> >>> > >>>>> > >>> unintended configurations from becoming de facto
>>>>>>>> standards,
>>>>>>>> >>> > >>>>> thereby
>>>>>>>> >>> > >>>>> > >>> reducing long-term maintenance risks.
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>> > >>> Jie Yang
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote:
>>>>>>>> >>> > >>>>> > >>> > -1 because there exists a feasible migration
>>>>>>>> path for Apache
>>>>>>>> >>> > >>>>> Spark
>>>>>>>> >>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5.
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> > It's obvious that this Databricks' mistake
>>>>>>>> already causes a
>>>>>>>> >>> > >>>>> huge
>>>>>>>> >>> > >>>>> > >>> communication cost in the Apache Spark community
>>>>>>>> and is
>>>>>>>> >>> > >>>>> suggesting a burden
>>>>>>>> >>> > >>>>> > >>> to enforce us to handle at least two more PRs at
>>>>>>>> 4.0.0 and 4.1.0.
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> > Given that, I don't think
>>>>>>>> >>> > >>>>> > >>> > - This is an inevitable or
>>>>>>>> >>> > >>>>> > >>> > - This is 0 cost
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> > Dongjoon.
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote:
>>>>>>>> >>> > >>>>> > >>> > > Starting from my +1 (non-binding).
>>>>>>>> >>> > >>>>> > >>> > >
>>>>>>>> >>> > >>>>> > >>> > > In addition, I propose to retain migration
>>>>>>>> logic till Spark
>>>>>>>> >>> > >>>>> 4.1.x and
>>>>>>>> >>> > >>>>> > >>> > > remove it in Spark 4.2.0.
>>>>>>>> >>> > >>>>> > >>> > >
>>>>>>>> >>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim <
>>>>>>>> >>> > >>>>> > >>> [email protected]>
>>>>>>>> >>> > >>>>> > >>> > > wrote:
>>>>>>>> >>> > >>>>> > >>> > >
>>>>>>>> >>> > >>>>> > >>> > > > Hi dev,
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > Please vote to retain migration logic of
>>>>>>>> incorrect
>>>>>>>> >>> > >>>>> > >>> `spark.databricks.*`
>>>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x.
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > - DISCUSSION:
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>>
>>>>>>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
>>>>>>>> >>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.*
>>>>>>>> config being
>>>>>>>> >>> > >>>>> exposed in
>>>>>>>> >>> > >>>>> > >>> 3.5.4 in
>>>>>>>> >>> > >>>>> > >>> > > > Spark 4.0.0+)
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > Specifically, please review this post
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>>
>>>>>>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k
>>>>>>>> >>> > >>>>> > >>> which
>>>>>>>> >>> > >>>>> > >>> > > > explains pros and cons about the proposal -
>>>>>>>> proposal is
>>>>>>>> >>> > >>>>> about
>>>>>>>> >>> > >>>>> > >>> "Option 1".
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > Simply speaking, this vote is to allow
>>>>>>>> streaming queries
>>>>>>>> >>> > >>>>> which had
>>>>>>>> >>> > >>>>> > >>> been
>>>>>>>> >>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with
>>>>>>>> Spark 4.0.x,
>>>>>>>> >>> > >>>>> "without
>>>>>>>> >>> > >>>>> > >>> having to
>>>>>>>> >>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If
>>>>>>>> the vote
>>>>>>>> >>> > >>>>> passes, we
>>>>>>>> >>> > >>>>> > >>> will help
>>>>>>>> >>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark
>>>>>>>> 3.5.4 to Spark
>>>>>>>> >>> > >>>>> 4.0.x,
>>>>>>>> >>> > >>>>> > >>> which would
>>>>>>>> >>> > >>>>> > >>> > > > be almost 1 year.
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > The (only) cons in this option is having to
>>>>>>>> retain the
>>>>>>>> >>> > >>>>> incorrect
>>>>>>>> >>> > >>>>> > >>> > > > configuration name as "string" in the
>>>>>>>> codebase a bit
>>>>>>>> >>> > >>>>> longer. The
>>>>>>>> >>> > >>>>> > >>> code
>>>>>>>> >>> > >>>>> > >>> > > > complexity of migration logic is arguably
>>>>>>>> trivial. (link
>>>>>>>> >>> > >>>>> > >>> > > > <
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>>
>>>>>>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> > > > )
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if
>>>>>>>> someone supports
>>>>>>>> >>> > >>>>> including
>>>>>>>> >>> > >>>>> > >>> migration
>>>>>>>> >>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please
>>>>>>>> cast +1 here
>>>>>>>> >>> > >>>>> and leave
>>>>>>>> >>> > >>>>> > >>> the
>>>>>>>> >>> > >>>>> > >>> > > > desired last minor version of Spark to
>>>>>>>> retain this
>>>>>>>> >>> > >>>>> migration logic.
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > The vote is open for the next 72 hours and
>>>>>>>> passes if a
>>>>>>>> >>> > >>>>> majority +1
>>>>>>>> >>> > >>>>> > >>> PMC
>>>>>>>> >>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1
>>>>>>>> votes.
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect
>>>>>>>> >>> > >>>>> `spark.databricks.*`
>>>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x
>>>>>>>> >>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect
>>>>>>>> >>> > >>>>> `spark.databricks.*`
>>>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because...
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > > > Thanks!
>>>>>>>> >>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR)
>>>>>>>> >>> > >>>>> > >>> > > >
>>>>>>>> >>> > >>>>> > >>> > >
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >>> > >>>>> > >>> > To unsubscribe e-mail:
>>>>>>>> [email protected]
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>> >
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >>> > >>>>> > >>> To unsubscribe e-mail:
>>>>>>>> [email protected]
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>> > >>>
>>>>>>>> >>> > >>>>> >
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >>> > >>>>> To unsubscribe e-mail: [email protected]
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>>
>>>>>>>> >>> > >>>>
>>>>>>>> >>> > >>>> --
>>>>>>>> >>> > >>>> Adam Binford
>>>>>>>> >>> > >>>>
>>>>>>>> >>> > >>>
>>>>>>>> >>> >
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >>> To unsubscribe e-mail: [email protected]
>>>>>>>> >>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>>
>>>>>>>>

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Reply via email to