Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Jungtaek Lim Fri, 14 Mar 2025 03:11:28 -0700

Sorry, I was missing the type of the vote - this totally depends on the
type of the vote. If we weren't intended to block the VOTE which could have
been interpreted as code change, maybe -0 or -0.5 or -0.99 should have been
used rather than -1 to block the process.


On Fri, Mar 14, 2025 at 7:01 PM Jungtaek Lim <[email protected]>
wrote:

> If we were not intended to block the VOTE but just to express the
> disagreement, please say "-1" instead of representing it as "veto". When
> saying veto, you intend to kill the process unless you are not persuaded or
> you are not having proper technical justification.
>
> On Fri, Mar 14, 2025 at 6:27 PM Jungtaek Lim <[email protected]>
> wrote:
>
>> Thanks for the update.
>>
>> Though I have to clarify that "What all of us agree on is that the
>> previous code base is okay." is not true.
>>
>> Wenchen summarized what happened in other thread which I think it's more
>> proper, like following:
>>
>> 1. A mistake was made, leading to a vendor name being included in the
>> configuration released in Spark 3.5.4.
>> 2. Dongjoon initiated a vote to deprecate the incorrect configuration
>> name in 3.5.5, and the vote passed. Thanks to Dongjoon, 3.5.5 was released
>> shortly after.
>> 3. A PR <https://github.com/apache/spark/pull/49897> that simply renamed
>> (rather than deprecated) the configuration was merged into master/4.0. This
>> is a breaking change and was not backed by a vote.
>> 4. This vote concerns adding migration logic to prevent the breaking
>> change from affecting streaming queries.
>>
>> One thing we have to make clear is, the PR
>> <https://github.com/apache/spark/pull/49897> that simply renamed was
>> submitted "earlier" (Feb 12 in KST) than the PR
>> <https://github.com/apache/spark/pull/49985> for migration (Feb 17 in
>> KST). The former PR was even merged on Feb 12, but we definitely had a
>> conversation to figure out the way to mitigate better at that time, because
>> it is definitely a breaking change like Wenchen said. That said, it was
>> just a quick fix and it warranted demanding followup work.
>>
>> That is why I came up with the migration logic, and I filed PR for
>> migration logic to "3.5/4.0/master" (not only 3.5), which definitely
>> implies I was/am intended to resolve the issue in all branches. The
>> decision to merge "only" to 3.5 is definitely not made among "us". It was
>> decided to merge to 3.5 among us, but no, the DISCUSSION I raised was the
>> first time we talked about master/4.0 in public. "We never had a consensus
>> for master/4.0", and it was me who drove the discussion for that.
>>
>> This is a root reason we had such a long argument, so we need to make
>> this very clear. No, I don't think I said I'm OK with not having migration
>> code in master/4.0.
>>
>> Also, I agree that Apple is not a vendor productionizing Spark, but my
>> overall point is, we "just" feel like it's not good to have the vendor name
>> in the codebase as the ASF project needs to try to be vendor neutral. "ASF
>> project should be vendor neutral" is interpreted by everyone in every
>> different way, but there is no evidence we have consensus that having a
>> vendor name in any arbitrary string is problematic. Having a vendor name in
>> the string doesn't mean anything except migration. No, we put it just
>> because we think it makes users' life better. That's all, no other reason
>> like political one, I can confidently say no, it's not. And people seem to
>> agree based on the outcome of DISCUSS and VOTE.
>>
>> The main question was, "where is the evidence it's safe to force users to
>> upgrade to Spark 3.5.5 before upgrading to Spark 4.0.0". It came up from
>> figuring out the resolution of the issue (I get where you are coming from),
>> but it is missing the big question about who will have a pain point, and I
>> did ask the community and I heard they are supportive to just allow
>> upgrading to Spark 4.0.0 directly. I really think the proposal was not
>> backed by consensus. It was driven solely by one person - it is backed by
>> willingness to remove the occurence of including vendor name in the
>> codebase ASAP. Again, everyone has every different way of interpretation
>> about the vendor name issue, so this should have been discussed before, to
>> weigh on the cost of having the vendor name. It shouldn't be something
>> someone solely just makes a decision by oneself.
>>
>> I'm not sure how this comes to a valid technical objection, because we
>> never discussed that approach in public, and the opposite approach
>> definitely gained traction in public. We never discussed that approach and
>> the codebase is already reflecting this, which is arguably a disaster. I
>> just wanted to fix that. That's all.
>>
>>
>>
>> On Fri, Mar 14, 2025 at 5:33 PM Dongjoon Hyun <[email protected]>
>> wrote:
>>
>>> Thank you all.
>>>
>>> The vote is finished in an intended way with the expected result. We
>>> have enough time to discuss and I have been sticking to my original
>>> technical justification from the beginning (including this).
>>>
>>> 1. Helping renaming the conf via SPARK-51172 (by approving it)
>>> 2. Banning `spark.databricks.*` via SPARK-51173 (by adding `configName`
>>> Scalastyle rule)
>>> 3. Led the discussion thread and reached the agreement to release Spark
>>> 3.5.5 early.
>>> 4. Releasing 3.5.5 as a release manager to provide a candidate migration
>>> path
>>> 5. Proposing to use the migration path
>>>
>>> This vote was Step 5. My technical point has always been aiming to
>>> recover the Apache Spark 4 codebase to the status before our mistake by
>>> containing the issue only in `branch-3.5` and providing the proposed narrow
>>> migration path. And, as mentioned already, that's the situation where we
>>> were during the vote at Apache Spark AS-IS branches. What all of us agree
>>> on is that the previous code base is okay. I didn't reply to
>>> Jungtaek's Apple comment intentionally because it's not a public
>>> Spark-vendor like Databricks. And, it's a product name of the popular
>>> consumer electronic devices like Intel/AMD/Graviton. In addition, I don't
>>> think we are going to add back `spark.databricks.*` because of the reason
>>> the customers ask for it. In the same way, this vote is one of the
>>> political decision making processes of Apache Spark PMC. We started this
>>> vote because we couldn't make a consensus.
>>>
>>> I believe I've been providing all my best to the Apache Spark community
>>> by actions and with valid technical clarification (without no modification
>>> during the process).
>>>
>>> Sincerely,
>>> Dongjoon
>>>
>>>
>>> On Thu, Mar 13, 2025 at 11:41 PM Mridul Muralidharan <[email protected]>
>>> wrote:
>>>
>>>>
>>>> FWIW, I am +1 on the proposal (though I missed the vote on this !)
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>> On Fri, Mar 14, 2025 at 1:31 AM Mridul Muralidharan <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>   I agree with Mark, imo this is a qualified veto.
>>>>> We should give Dongjoon the opportunity to give his clarification, if
>>>>> any.
>>>>>
>>>>> I do realize this delays the RC process, but this deserves to be
>>>>> looked into carefully.
>>>>>
>>>>> Thanks,
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2025 at 9:35 PM Mark Hamstra <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Absolutely not!
>>>>>>
>>>>>> This is clearly a vote on a code change, not on a procedural issue or
>>>>>> a package release. The code change has been vetoed by a -1 vote by a
>>>>>> qualified voter.
>>>>>>
>>>>>> On Thu, Mar 13, 2025 at 6:58 PM Jungtaek Lim
>>>>>> <[email protected]> wrote:
>>>>>> >
>>>>>> > Likewise I said, I'm concluding the VOTE since we ensure the
>>>>>> criteria (3 +1 binding, 1 -1 binding, and also +1s from non-binding).
>>>>>> >
>>>>>> > I don't consider -1 as a veto as I explained, as we should have
>>>>>> multiple -1s if we go for VOTE with the current codebase. (+1 in this
>>>>>> proposal is effectively -1 in another proposal.)
>>>>>> >
>>>>>> > The vote followed the Apache Voting Process with the type of
>>>>>> "package release" (which we tend to use in dev@ for VOTE). I guess
>>>>>> it could have also done with "procedural issues" which is less strict, 
>>>>>> but
>>>>>> then this fulfills both types of votes which should be OK.
>>>>>> >
>>>>>> > The current codebase is "accidentally" representing another
>>>>>> proposal and it is never intended. I don't find the way I can -1 to the
>>>>>> current codebase, and make a different change neither bound to any 
>>>>>> proposal
>>>>>> to be fair.
>>>>>> >
>>>>>> > I don't want to block the release because of the above. So, let's
>>>>>> change the current codebase the way we discussed and voted here. 
>>>>>> Reverting
>>>>>> this decision should require another VOTE.
>>>>>> >
>>>>>> > Thanks to everyone who voted!
>>>>>> >
>>>>>> > On Thu, Mar 13, 2025 at 4:54 PM Jungtaek Lim <
>>>>>> [email protected]> wrote:
>>>>>> >>
>>>>>> >> Thanks to everyone who participated and voted!
>>>>>> >>
>>>>>> >> Now I can technically conclude the VOTE, but I'm willing to wait
>>>>>> till US daytime tomorrow, to give some time for Dongjoon to revisit this.
>>>>>> >>
>>>>>> >> I'll conclude the vote around 6PM PST tomorrow regardless of his
>>>>>> vote. It's ideal to see us have no -1, but having one -1 doesn't block 
>>>>>> this
>>>>>> vote and we can move forward.
>>>>>> >>
>>>>>> >> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <[email protected]>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> forgot to mention in my last reply, my stance is +1
>>>>>> >>>
>>>>>> >>> Jie Yang
>>>>>> >>>
>>>>>> >>> On 2025/03/13 07:08:12 Russell Jurney wrote:
>>>>>> >>> > Sure, +1 non-binding.
>>>>>> >>> >
>>>>>> >>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim <
>>>>>> [email protected]>
>>>>>> >>> > wrote:
>>>>>> >>> >
>>>>>> >>> > > Russell,
>>>>>> >>> > >
>>>>>> >>> > > Of course, we hear people' voices who aren't having binding
>>>>>> votes as well.
>>>>>> >>> > > Personally I think it's more important than committers/PMC
>>>>>> members'  VOTE
>>>>>> >>> > > this time since we can be biased and be far from user
>>>>>> experience.
>>>>>> >>> > >
>>>>>> >>> > > Could you please explicitly cast your vote, like +1
>>>>>> (non-binding)? You
>>>>>> >>> > > seem to agree with the proposal. Thanks!
>>>>>> >>> > >
>>>>>> >>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney <
>>>>>> [email protected]>
>>>>>> >>> > > wrote:
>>>>>> >>> > >
>>>>>> >>> > >> I'm just a lurker and aspiring contributor, but as a Spark
>>>>>> user upgrading
>>>>>> >>> > >> twice is very confusing and would cause many or most users
>>>>>> to fail to
>>>>>> >>> > >> upgrade successfully to Spark 4 on a first go. That seems
>>>>>> like a very bad
>>>>>> >>> > >> user experience. I thought it was worthwhile stating this
>>>>>> out loud.
>>>>>> >>> > >>
>>>>>> >>> > >> Russell
>>>>>> >>> > >>
>>>>>> >>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <
>>>>>> [email protected]> wrote:
>>>>>> >>> > >>
>>>>>> >>> > >>> this vote is to allow streaming queries which had been ever
>>>>>> run in Spark
>>>>>> >>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to
>>>>>> be upgraded with
>>>>>> >>> > >>>> Spark 3.5.5+ in prior".
>>>>>> >>> > >>>
>>>>>> >>> > >>>
>>>>>> >>> > >>> In the history of Apache Spark, have we ever required users
>>>>>> to upgrade
>>>>>> >>> > >>> to the next maintenance release before moving to a new
>>>>>> feature or major
>>>>>> >>> > >>> release?
>>>>>> >>> > >>>
>>>>>> >>> > >>> Xiao
>>>>>> >>> > >>>
>>>>>> >>> > >>> Adam Binford <[email protected]> 于2025年3月11日周二 09:08写道：
>>>>>> >>> > >>>
>>>>>> >>> > >>>> +1 (non-binding)
>>>>>> >>> > >>>>
>>>>>> >>> > >>>> It's a pretty in the weeds issue with how Structured
>>>>>> Streaming works
>>>>>> >>> > >>>> under the hood that's kinda hard to understand if you're
>>>>>> not familiar with
>>>>>> >>> > >>>> it. The migration logic doesn't mean users can still use
>>>>>> the old config,
>>>>>> >>> > >>>> it's purely behind the scenes to fix checkpoint metadata
>>>>>> in streams created
>>>>>> >>> > >>>> in 3.5.4. The 5 lines of code it takes to address a weird
>>>>>> edge case for
>>>>>> >>> > >>>> certain users that's already gone from master shouldn't be
>>>>>> a huge deal.
>>>>>> >>> > >>>>
>>>>>> >>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <
>>>>>> [email protected]> wrote:
>>>>>> >>> > >>>>
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>> To Sean, you're right, I'm very sorry.
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>> From the perspective of compatibility and migratability,
>>>>>> I think we
>>>>>> >>> > >>>>> should migrate this logic to 4.0.0 and keep it in the
>>>>>> codebase for a longer
>>>>>> >>> > >>>>> time (or permanently), because we can't predict which
>>>>>> version users of
>>>>>> >>> > >>>>> 3.5.4 will choose next.
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>> I don't want to discuss the so-called vendor issue.
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>> I withdraw my previous -1.
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>> Jie Yang.
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote:
>>>>>> >>> > >>>>> > Guys, let’s be honest about what we’re discussing here.
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> > If this is a migration issue, why would we even need a
>>>>>> vote? We’ve
>>>>>> >>> > >>>>> been
>>>>>> >>> > >>>>> > consistently adding configurations to restore legacy
>>>>>> behavior
>>>>>> >>> > >>>>> instead of
>>>>>> >>> > >>>>> > removing them because we understand the challenges of
>>>>>> upgrading Spark
>>>>>> >>> > >>>>> > versions. Our goal has always been to make upgrades
>>>>>> easier, even if
>>>>>> >>> > >>>>> it
>>>>>> >>> > >>>>> > means carrying some technical debt. I don’t think we
>>>>>> want to change
>>>>>> >>> > >>>>> that
>>>>>> >>> > >>>>> > culture now.
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> > If the concern is about vendor names appearing in the
>>>>>> codebase, then
>>>>>> >>> > >>>>> why is
>>>>>> >>> > >>>>> > it a big deal this time when vendor names are already
>>>>>> present
>>>>>> >>> > >>>>> elsewhere? If
>>>>>> >>> > >>>>> > we’ve failed to follow a policy, let’s correct it, but
>>>>>> can someone
>>>>>> >>> > >>>>> point to
>>>>>> >>> > >>>>> > the specific policy we’re violating?
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> > If the vote is about adding migration logic to ease the
>>>>>> upgrade from
>>>>>> >>> > >>>>> 3.5.4
>>>>>> >>> > >>>>> > to 4.0.0, then +1, why not?
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> > Thanks,
>>>>>> >>> > >>>>> > Wenchen
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim <
>>>>>> >>> > >>>>> [email protected]>
>>>>>> >>> > >>>>> > wrote:
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>> > > Well said, Sean. Sorry I made you keep around here
>>>>>> since it might
>>>>>> >>> > >>>>> not be
>>>>>> >>> > >>>>> > > clearly stated. My bad.
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > > Yang, how could we ever tolerate the fact there are
>>>>>> "other"
>>>>>> >>> > >>>>> occurrences of
>>>>>> >>> > >>>>> > > vendor names in the codebase? Please go and search
>>>>>> "databricks" in
>>>>>> >>> > >>>>> the
>>>>>> >>> > >>>>> > > codebase and be surprised.
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > > If we believe that having vendor names in the
>>>>>> codebase will
>>>>>> >>> > >>>>> increase
>>>>>> >>> > >>>>> > > the occurrence of making mistakes, why didn't we have
>>>>>> a discussion
>>>>>> >>> > >>>>> thread
>>>>>> >>> > >>>>> > > earlier to remove all occurrences altogether? This is
>>>>>> super tricky
>>>>>> >>> > >>>>> because
>>>>>> >>> > >>>>> > > I can even start to argue we have "Apple" as a vendor
>>>>>> name in
>>>>>> >>> > >>>>> Apache Spark
>>>>>> >>> > >>>>> > > codebase. I'm not saying we use "apple" in the test
>>>>>> data. See
>>>>>> >>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? No,
>>>>>> >>> > >>>>> `isMacOnMSeries` or
>>>>>> >>> > >>>>> > > `isMacOnSilicon` is enough.
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > > We really need to draw a line where we disallow
>>>>>> vendor names on it
>>>>>> >>> > >>>>> - if
>>>>>> >>> > >>>>> > > it's the entire codebase, I don't really think it is
>>>>>> realistic.
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > > This was really a mistake, and it was definitely not
>>>>>> from
>>>>>> >>> > >>>>> referring to the
>>>>>> >>> > >>>>> > > existing codebase. Not having a vendor name does not
>>>>>> change
>>>>>> >>> > >>>>> anything on the
>>>>>> >>> > >>>>> > > chance of encountering this issue again. If we really
>>>>>> care, we
>>>>>> >>> > >>>>> should think
>>>>>> >>> > >>>>> > > about style checking, which is the only viable way to
>>>>>> catch the
>>>>>> >>> > >>>>> mistake.
>>>>>> >>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor
>>>>>> names in that
>>>>>> >>> > >>>>> style
>>>>>> >>> > >>>>> > > check, not just the problematic vendor name.
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen <
>>>>>> [email protected]>
>>>>>> >>> > >>>>> wrote:
>>>>>> >>> > >>>>> > >
>>>>>> >>> > >>>>> > >> Doesn't the migration code 'clear' the debt?
>>>>>> >>> > >>>>> > >> The proposal is not to continue to support the
>>>>>> config.
>>>>>> >>> > >>>>> > >> I feel like people are not quite understanding the
>>>>>> change, and
>>>>>> >>> > >>>>> objecting
>>>>>> >>> > >>>>> > >> to something that doesn't exist.
>>>>>> >>> > >>>>> > >> It's a shame, as this seems like something not even
>>>>>> worth
>>>>>> >>> > >>>>> discussing. I
>>>>>> >>> > >>>>> > >> don't know why this triggered this much discussion.
>>>>>> We have kept
>>>>>> >>> > >>>>> deprecated
>>>>>> >>> > >>>>> > >> methods without blinking, which is in comparison
>>>>>> much bigger.
>>>>>> >>> > >>>>> > >> Can we maybe ask you review the actual change in
>>>>>> question?
>>>>>> >>> > >>>>> > >>
>>>>>> >>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <
>>>>>> [email protected]>
>>>>>> >>> > >>>>> wrote:
>>>>>> >>> > >>>>> > >>
>>>>>> >>> > >>>>> > >>> -1
>>>>>> >>> > >>>>> > >>> Remove migration logic of incorrect
>>>>>> `spark.databricks.*`
>>>>>> >>> > >>>>> configuration
>>>>>> >>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration
>>>>>> was initially
>>>>>> >>> > >>>>> introduced
>>>>>> >>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear design
>>>>>> intent.
>>>>>> >>> > >>>>> Although the
>>>>>> >>> > >>>>> > >>> immediate maintenance cost of retaining this
>>>>>> configuration
>>>>>> >>> > >>>>> currently seems
>>>>>> >>> > >>>>> > >>> limited, as subsequent versions iterate and user
>>>>>> habits form, it
>>>>>> >>> > >>>>> may lead
>>>>>> >>> > >>>>> > >>> to the continuous accumulation of technical debt.
>>>>>> When users
>>>>>> >>> > >>>>> come to view
>>>>>> >>> > >>>>> > >>> this configuration as one that can be relied on
>>>>>> long-term,
>>>>>> >>> > >>>>> future removal
>>>>>> >>> > >>>>> > >>> may face greater resistance from users and could
>>>>>> potentially
>>>>>> >>> > >>>>> become an
>>>>>> >>> > >>>>> > >>> entrenched and redundant configuration in the
>>>>>> codebase.
>>>>>> >>> > >>>>> Therefore, promptly
>>>>>> >>> > >>>>> > >>> correcting this historically accidental
>>>>>> configuration not only
>>>>>> >>> > >>>>> maintains
>>>>>> >>> > >>>>> > >>> the normativity of the Spark configuration system
>>>>>> but also
>>>>>> >>> > >>>>> prevents
>>>>>> >>> > >>>>> > >>> unintended configurations from becoming de facto
>>>>>> standards,
>>>>>> >>> > >>>>> thereby
>>>>>> >>> > >>>>> > >>> reducing long-term maintenance risks.
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>> > >>> Jie Yang
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote:
>>>>>> >>> > >>>>> > >>> > -1 because there exists a feasible migration path
>>>>>> for Apache
>>>>>> >>> > >>>>> Spark
>>>>>> >>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5.
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> > It's obvious that this Databricks' mistake
>>>>>> already causes a
>>>>>> >>> > >>>>> huge
>>>>>> >>> > >>>>> > >>> communication cost in the Apache Spark community
>>>>>> and is
>>>>>> >>> > >>>>> suggesting a burden
>>>>>> >>> > >>>>> > >>> to enforce us to handle at least two more PRs at
>>>>>> 4.0.0 and 4.1.0.
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> > Given that, I don't think
>>>>>> >>> > >>>>> > >>> > - This is an inevitable or
>>>>>> >>> > >>>>> > >>> > - This is 0 cost
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> > Dongjoon.
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote:
>>>>>> >>> > >>>>> > >>> > > Starting from my +1 (non-binding).
>>>>>> >>> > >>>>> > >>> > >
>>>>>> >>> > >>>>> > >>> > > In addition, I propose to retain migration
>>>>>> logic till Spark
>>>>>> >>> > >>>>> 4.1.x and
>>>>>> >>> > >>>>> > >>> > > remove it in Spark 4.2.0.
>>>>>> >>> > >>>>> > >>> > >
>>>>>> >>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim <
>>>>>> >>> > >>>>> > >>> [email protected]>
>>>>>> >>> > >>>>> > >>> > > wrote:
>>>>>> >>> > >>>>> > >>> > >
>>>>>> >>> > >>>>> > >>> > > > Hi dev,
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > Please vote to retain migration logic of
>>>>>> incorrect
>>>>>> >>> > >>>>> > >>> `spark.databricks.*`
>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x.
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > - DISCUSSION:
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>>
>>>>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
>>>>>> >>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* config
>>>>>> being
>>>>>> >>> > >>>>> exposed in
>>>>>> >>> > >>>>> > >>> 3.5.4 in
>>>>>> >>> > >>>>> > >>> > > > Spark 4.0.0+)
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > Specifically, please review this post
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>>
>>>>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k
>>>>>> >>> > >>>>> > >>> which
>>>>>> >>> > >>>>> > >>> > > > explains pros and cons about the proposal -
>>>>>> proposal is
>>>>>> >>> > >>>>> about
>>>>>> >>> > >>>>> > >>> "Option 1".
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > Simply speaking, this vote is to allow
>>>>>> streaming queries
>>>>>> >>> > >>>>> which had
>>>>>> >>> > >>>>> > >>> been
>>>>>> >>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with
>>>>>> Spark 4.0.x,
>>>>>> >>> > >>>>> "without
>>>>>> >>> > >>>>> > >>> having to
>>>>>> >>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If
>>>>>> the vote
>>>>>> >>> > >>>>> passes, we
>>>>>> >>> > >>>>> > >>> will help
>>>>>> >>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark
>>>>>> 3.5.4 to Spark
>>>>>> >>> > >>>>> 4.0.x,
>>>>>> >>> > >>>>> > >>> which would
>>>>>> >>> > >>>>> > >>> > > > be almost 1 year.
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > The (only) cons in this option is having to
>>>>>> retain the
>>>>>> >>> > >>>>> incorrect
>>>>>> >>> > >>>>> > >>> > > > configuration name as "string" in the
>>>>>> codebase a bit
>>>>>> >>> > >>>>> longer. The
>>>>>> >>> > >>>>> > >>> code
>>>>>> >>> > >>>>> > >>> > > > complexity of migration logic is arguably
>>>>>> trivial. (link
>>>>>> >>> > >>>>> > >>> > > > <
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>>
>>>>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> > > > )
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone
>>>>>> supports
>>>>>> >>> > >>>>> including
>>>>>> >>> > >>>>> > >>> migration
>>>>>> >>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please
>>>>>> cast +1 here
>>>>>> >>> > >>>>> and leave
>>>>>> >>> > >>>>> > >>> the
>>>>>> >>> > >>>>> > >>> > > > desired last minor version of Spark to retain
>>>>>> this
>>>>>> >>> > >>>>> migration logic.
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > The vote is open for the next 72 hours and
>>>>>> passes if a
>>>>>> >>> > >>>>> majority +1
>>>>>> >>> > >>>>> > >>> PMC
>>>>>> >>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes.
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect
>>>>>> >>> > >>>>> `spark.databricks.*`
>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x
>>>>>> >>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect
>>>>>> >>> > >>>>> `spark.databricks.*`
>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because...
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > > > Thanks!
>>>>>> >>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR)
>>>>>> >>> > >>>>> > >>> > > >
>>>>>> >>> > >>>>> > >>> > >
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> >>> > >>>>> > >>> > To unsubscribe e-mail:
>>>>>> [email protected]
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>> >
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> >>> > >>>>> > >>> To unsubscribe e-mail:
>>>>>> [email protected]
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>> > >>>
>>>>>> >>> > >>>>> >
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> >>> > >>>>> To unsubscribe e-mail: [email protected]
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>>
>>>>>> >>> > >>>>
>>>>>> >>> > >>>> --
>>>>>> >>> > >>>> Adam Binford
>>>>>> >>> > >>>>
>>>>>> >>> > >>>
>>>>>> >>> >
>>>>>> >>>
>>>>>> >>>
>>>>>> ---------------------------------------------------------------------
>>>>>> >>> To unsubscribe e-mail: [email protected]
>>>>>> >>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>
>>>>>>

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Reply via email to