Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Mich Talebzadeh Sat, 15 Mar 2025 13:38:07 -0700

This is my gist

Mark from your passionate language I gather you see this as a "Code Change"
veto. Your reasoning seems to be straightforward, i.e. the vote's purpose
is to decide whether to add code (migration logic) to the Spark 4.0 branch.
In your view, the outcome of the vote directly alters the software's code?


 However, If we see it as a *procedural matter only* like some others
include myself, it involves a vote and the interpretation of rules.

In summary

   1. If it is a code change vote, a -1 can be seen as a veto, blocking the
   change unless specific conditions are met (like the PMC overriding the
   veto).
   2. If it is just a procedural vote, a -1 might simply be a dissenting
   vote, *not necessarily carrying the power to block the entire action.*

FYI, I recall I voted -1 (non binding) on another thread and Dongioon asked
me to explain which it was in his right

I can see the following vote cast (1)

   - Jungtaek Lim: +1 (non-binding)
   - Sean Owen: +1 to retain
   - Yang Jie: -1, later withdraws it and casts +1
   - Adam Binford: +1 (non-binding)
   - Russell Jurney: +1 non-binding
   - Yang Jie: +1
   - Mridul Muralidharan: +1
   - Dongjoon Hyun: -1


*(1) Summary of Voting from 21 emails in the attached file* from
https://lists.apache.org/thread/nm3p1zjcybdl0p0mc56t2rl92hb9837n :

For Retaining Migration Logic (+1): Jungtaek Lim, Sean Owen, Yang Jie
(initially -1, then +1), Adam Binford (non-binding), Russell Jurney
(non-binding), Mridul Muralidharan
Against Retaining Migration Logic (-1): Dongjoon Hyun

Maybe we should put a bar on it and allow Dongjoon to qualify his statement
as 1 or 2 above, thern it could be escalated if needed or put at rest

HTH





On Sat, 15 Mar 2025 at 18:26, Mark Hamstra <[email protected]> wrote:

> That is utter nonsense, Sean! You do not have any authority to declare
> the matter concluded, and I will escalate to the board if you persist
> in this approach.
>
> The proposed code change has been vetoed. As I delineated previously,
> there are two and only two ways forward under the ASF Voting Process.
> That does not include any individual simply declaring that the matter
> has been concluded regardless of the veto and ASF process.
>
> On Sat, Mar 15, 2025 at 11:18 AM Sean Owen <[email protected]> wrote:
> >
> > Mark et al - this thread has gone on way too long. Everyone has
> expressed their opinion. The result stands.
> > Anyone who is really upset about it, please escalate to the board or
> something, but, this thread and decision point has now concluded.
> >
> >
> > On Sat, Mar 15, 2025 at 1:16 PM Mark Hamstra <[email protected]>
> wrote:
> >>
> >> You do not have the authority to declare Dongjoon's technical
> >> justification invalid. That is up to the PMC: "In case of doubt,
> >> deciding whether a technical justification is valid is up to the PMC."
> >>
> >> On Sat, Mar 15, 2025 at 6:20 AM Jungtaek Lim
> >> <[email protected]> wrote:
> >> >
> >> > To summarize, the main arguments of both proposals are "whether we
> can force users to upgrade to Spark 3.5.5 first before upgrading Spark
> 4.0.0" vs "we should include migration logic to Spark 4.0.0 because that is
> not realistic". Where is the "technical objection" here? If you say there
> was politics I can clearly say never, but even if you interpret there was
> politics, politics is not "technical objection". I can quote the relevant
> ASF page for you.
> >> >
> >> > https://www.apache.org/foundation/voting.html#Veto
> >> >
> >> > > A -1 vote by a qualified voter stops a code-modification proposal
> in its tracks. This constitutes a veto, and it cannot be overruled nor
> overridden by anyone. Vetoes stand until and unless the individual
> withdraws their veto.
> >> > > To prevent vetoes from being used capriciously, the voter must
> provide with the veto a technical justification showing why the change is
> bad (opens a security exposure, negatively affects performance, etc. ). A
> veto without a justification is invalid and has no weight.
> >> >
> >> > The justification must be "technical one" for vote. I hope ASF just
> lists the most cases rather than leaving this as etc, but I think ASF
> believes individual's judgement, and I claim there is no "technical
> reason". Having to put 4 more lines is never a technical reason. It is
> never meant to be used for blocking different opinions. It must be used for
> blocking "incidents which impact users", while we are here to do the
> opposite, saving users' life.
> >> >
> >> > On Sat, Mar 15, 2025 at 9:51 PM Jungtaek Lim <
> [email protected]> wrote:
> >> >>
> >> >> > That's the reason why you proposed the vote procedure and we
> agreed.
> >> >>
> >> >> Didn’t you see the part “we agreed”? Who is we in the context?
> >> >>
> >> >> I don’t think he answered my questions - he explained his reasoning
> of his proposal which majorly does not agree with. You even said uou are
> not persuaded and I want to ask you now you were persuaded from his last
> post.
> >> >>
> >> >> Again I haven’t heard my answers. He showed his reasoning but there
> is nothing about the evidence of the validity of “technical” objection. I
> think I have asked people who judged his -1 as veto for their reasoning of
> how this could be “technical” objection and I don’t think I heard anything.
> >> >>
> >> >> I can be corrected if you can point out what is the “technical”
> objection. If you or Dongjoon do not provide this to the end of the week, I
> have to consider I haven’t heard about that and the veto (although Dongjoon
> stated it is not a veto) will be ignored.
> >> >>
> >> >> 2025년 3월 15일 (토) 오후 8:19, Mark Hamstra <[email protected]>님이 작성:
> >> >>>
> >> >>> Once again, I have to object. Dongjoon said that the vote is a time
> >> >>> limited procedure, not that the vote itself is a procedural vote as
> >> >>> distinct from a code change vote or a package release vote.
> >> >>>
> >> >>> Frankly, this feels like you are trying to manipulate the vote
> >> >>> procedure by misrepresenting Dongjoon, and you are quickly losing my
> >> >>> confidence in your ability to administer a fair voting procedure.
> >> >>>
> >> >>> I still consider the proposal to be vetoed.
> >> >>>
> >> >>>
> >> >>> On Fri, Mar 14, 2025 at 6:11 PM Jungtaek Lim
> >> >>> <[email protected]> wrote:
> >> >>> >
> >> >>> > UPDATE:
> >> >>> >
> >> >>> > We were having a discussion about the type of VOTE, since
> Dongjoon's -1 should be considered as a veto if we see this as a code
> change VOTE.
> >> >>> > Dongjoon clarified that he does not see this VOTE as a code
> change, hence he gave -1 but not intended to block the VOTE.
> >> >>> >
> >> >>> > That said, we have confirmed that Dongjoon's -1 is not a veto. I
> think the VOTE result is correct as it is. I'll proceed with the next steps.
> >> >>> >
> >> >>> > On Fri, Mar 14, 2025 at 11:19 AM Jungtaek Lim <
> [email protected]> wrote:
> >> >>> >>
> >> >>> >> The vote passes with 7 +1s (3 binding +1s) and 1 -1s (1 binding
> -1s).
> >> >>> >> Thanks to all who helped with the vote!
> >> >>> >>
> >> >>> >> I'm going to make a code change in branch-4.0 quickly so that we
> don't have to trigger another RC for Spark 4.0.0 just because of this.
> >> >>> >>
> >> >>> >> (* = binding)
> >> >>> >> +1:
> >> >>> >> - Sean R. Owen *
> >> >>> >> - Jungtaek Lim
> >> >>> >> - Nicholas Chammas
> >> >>> >> - Wenchen Fan *
> >> >>> >> - Adam Binford
> >> >>> >> - Russell Jurney
> >> >>> >> - Yang Jie *
> >> >>> >>
> >> >>> >> -1:
> >> >>> >> - Dongjoon Hyun *
> >> >>> >>
> >> >>> >> Thanks,
> >> >>> >> Jungtaek Lim (HeartSaVioR)
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe e-mail: [email protected]
> >> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: [email protected]
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Foal logo
Search this list...

Available projects (382):
Display Settings Logged out
[email protected], March 2025:Showing 1 through 15 of 16 topics

Found 170 emails by 52 authors, divided into 16 topics.
Most active authors:
Jungtaek Lim:
52 emails sentWenchen Fan:
9 emails sentMark Hamstra:
9 emails sentMich Talebzadeh:
8 emails sentDongjoon Hyun:
6 emails sent
Popular topics:
51318
spark.databricks
3.5.4
retain
transformwithstate
4.0.0
nanosecond
landing
exposed
swift
Subscribe to list
devreviewsissuescommitsuser
Yogitha Ramanathan
Unsubscribe
-- Thanks & Regards Yogitha R
11 112025-03-15 15:11 +00:00Jungtaek Lim
[VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration
in Spark 4.0.x
Hi dev, Please vote to retain migration logic of incorrect `spark.databricks.*`
configuration in Spark 4.0.x. - DISCUSSION:
https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr ([DISCUSS...
12 372025-03-15 14:26 +00:00

Jungtaek Lim - Monday 10 March 2025 12:44:42 GMT
[VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration
in Spark 4.0.x
Hi dev,

Please vote to retain migration logic of incorrect `spark.databricks.*`
configuration in Spark 4.0.x.

- DISCUSSION:
https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
([DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in
Spark 4.0.0+)

Specifically, please review this post
https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k which
explains pros and cons about the proposal - proposal is about "Option 1".

Simply speaking, this vote is to allow streaming queries which had been
ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, "without having to
be upgraded with Spark 3.5.5+ in prior". If the vote passes, we will help
users to have a smooth upgrade from Spark 3.5.4 to Spark 4.0.x, which would
be almost 1 year.

The (only) cons in this option is having to retain the incorrect
configuration name as "string" in the codebase a bit longer. The code
complexity of migration logic is arguably trivial. (link
<https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183>
)

This VOTE is for Spark 4.0.x, but if someone supports including migration
logic to be longer than Spark 4.0.x, please cast +1 here and leave the
desired last minor version of Spark to retain this migration logic.

The vote is open for the next 72 hours and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Retain migration logic of incorrect `spark.databricks.*`
configuration in Spark 4.0.x
[ ] -1 Remove migration logic of incorrect `spark.databricks.*`
configuration in Spark 4.0.0 because...

Thanks!
Jungtaek Lim (HeartSaVioR)

Sean Owen - Monday 10 March 2025 12:46:05 GMT
+1 to retain, to avoid problems for users at ~0 cost.

Jungtaek Lim - Monday 10 March 2025 12:46:16 GMT
Starting from my +1 (non-binding).

In addition, I propose to retain migration logic till Spark 4.1.x and
remove it in Spark 4.2.0.

Dongjoon Hyun - Monday 10 March 2025 14:52:52 GMT
-1 because there exists a feasible migration path for Apache Spark 3.5.4 via
Apache Spark 3.5.5.

It's obvious that this Databricks' mistake already causes a huge communication
cost in the Apache Spark community and is suggesting a burden to enforce us to
handle at least two more PRs at 4.0.0 and 4.1.0.

Given that, I don't think
- This is an inevitable or
- This is 0 cost

Dongjoon.

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Yang Jie - Tuesday 11 March 2025 03:01:56 GMT
-1
Remove migration logic of incorrect `spark.databricks.*` configuration in Spark
4.0.0 because I think this configuration was initially introduced accidentally
in Spark 3.5.4, lacking a clear design intent. Although the immediate
maintenance cost of retaining this configuration currently seems limited, as
subsequent versions iterate and user habits form, it may lead to the continuous
accumulation of technical debt. When users come to view this configuration as
one that can be relied on long-term, future removal may face greater resistance
from users and could potentially become an entrenched and redundant
configuration in the codebase. Therefore, promptly correcting this historically
accidental configuration not only maintains the normativity of the Spark
configuration system but also prevents unintended configurations from becoming
de facto standards, thereby reducing long-term maintenance risks.

Jie Yang

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Sean Owen - Tuesday 11 March 2025 03:16:34 GMT
Doesn't the migration code 'clear' the debt?
The proposal is not to continue to support the config.
I feel like people are not quite understanding the change, and objecting to
something that doesn't exist.
It's a shame, as this seems like something not even worth discussing. I
don't know why this triggered this much discussion. We have kept deprecated
methods without blinking, which is in comparison much bigger.
Can we maybe ask you review the actual change in question?

Jungtaek Lim - Tuesday 11 March 2025 03:48:03 GMT
Well said, Sean. Sorry I made you keep around here since it might not be
clearly stated. My bad.

Yang, how could we ever tolerate the fact there are "other" occurrences of
vendor names in the codebase? Please go and search "databricks" in the
codebase and be surprised.

If we believe that having vendor names in the codebase will increase
the occurrence of making mistakes, why didn't we have a discussion thread
earlier to remove all occurrences altogether? This is super tricky because
I can even start to argue we have "Apple" as a vendor name in Apache Spark
codebase. I'm not saying we use "apple" in the test data. See
`isMacOnAppleSilicon` in Utils. Is it unavoidable? No, `isMacOnMSeries` or
`isMacOnSilicon` is enough.

We really need to draw a line where we disallow vendor names on it - if
it's the entire codebase, I don't really think it is realistic.

This was really a mistake, and it was definitely not from referring to the
existing codebase. Not having a vendor name does not change anything on the
chance of encountering this issue again. If we really care, we should think
about style checking, which is the only viable way to catch the mistake.
Again, I'd argue we have to have a bunch of vendor names in that style
check, not just the problematic vendor name.

Nicholas Chammas - Tuesday 11 March 2025 03:57:15 GMT
I agree with Sean that this proposal does not seem to me as controversial as it
has turned out so far.

Jungtaek’s detailed breakdown on the other thread
<https://lists.apache.org/thread/zlhgr1mx0q520odvpnmnzwd8mp9x6bpl> explains
that this proposed change is mainly to benefit open source users of Apache
Spark and give them a way to directly upgrade from Apache Spark 3.5.4 to 4.0.0,
as opposed to forcing them to upgrade first to 3.5.5 before then being able to
upgrade to 4.0.0.

Jungtaek’s proposal is essentially a convenience to open source users. These
users may or may not be using a vendor distribution of Spark. It does not
benefit or harm Databricks or any other vendor. And it adds a very small
maintenance burden on contributors.

Isn’t this a tradeoff we should generally make? Help users upgrade at a minor
maintenance cost.

Wenchen Fan - Tuesday 11 March 2025 04:42:25 GMT
Guys, let’s be honest about what we’re discussing here.

If this is a migration issue, why would we even need a vote? We’ve been
consistently adding configurations to restore legacy behavior instead of
removing them because we understand the challenges of upgrading Spark
versions. Our goal has always been to make upgrades easier, even if it
means carrying some technical debt. I don’t think we want to change that
culture now.

If the concern is about vendor names appearing in the codebase, then why is
it a big deal this time when vendor names are already present elsewhere? If
we’ve failed to follow a policy, let’s correct it, but can someone point to
the specific policy we’re violating?

If the vote is about adding migration logic to ease the upgrade from 3.5.4
to 4.0.0, then +1, why not?

Thanks,
Wenchen

Yang Jie - Tuesday 11 March 2025 05:41:04 GMT

To Sean, you're right, I'm very sorry.

From the perspective of compatibility and migratability, I think we should
migrate this logic to 4.0.0 and keep it in the codebase for a longer time (or
permanently), because we can't predict which version users of 3.5.4 will choose
next.

I don't want to discuss the so-called vendor issue.

I withdraw my previous -1.

Jie Yang.

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Adam Binford - Tuesday 11 March 2025 16:07:14 GMT
+1 (non-binding)

It's a pretty in the weeds issue with how Structured Streaming works under
the hood that's kinda hard to understand if you're not familiar with it.
The migration logic doesn't mean users can still use the old config, it's
purely behind the scenes to fix checkpoint metadata in streams created in
3.5.4. The 5 lines of code it takes to address a weird edge case for
certain users that's already gone from master shouldn't be a huge deal.

--
Adam Binford

Xiao Li - Thursday 13 March 2025 06:04:45 GMT

In the history of Apache Spark, have we ever required users to upgrade to
the next maintenance release before moving to a new feature or major
release?

Xiao

Adam Binford <[email protected]> 于2025年3月11日周二 09:08写道：

Russell Jurney - Thursday 13 March 2025 06:13:22 GMT
I'm just a lurker and aspiring contributor, but as a Spark user upgrading
twice is very confusing and would cause many or most users to fail to
upgrade successfully to Spark 4 on a first go. That seems like a very bad
user experience. I thought it was worthwhile stating this out loud.

Russell

Jungtaek Lim - Thursday 13 March 2025 06:17:55 GMT
Russell,

Of course, we hear people' voices who aren't having binding votes as well.
Personally I think it's more important than committers/PMC members' VOTE
this time since we can be biased and be far from user experience.

Could you please explicitly cast your vote, like +1 (non-binding)? You seem
to agree with the proposal. Thanks!

Jungtaek Lim - Thursday 13 March 2025 06:39:40 GMT
Dongjoon,

I wonder whether I can influence you to revisit what has happened. We are
here because we have two different approaches where we have agreed to
disagree on the approach. We posted the discussion in dev@ because we want
to hear from 3rd eyes. In DISCUSSION and VOTE threads, I'm seeing support
(including non-brickster) from people about my proposal. They are users of
Apache Spark, and we have a responsibility to hear from them.

I'd like to see whether you can revisit this, confirming people's
preference on this topic. Personally I don't think this warrants VOTE, so I
can do whatever simplest and easiest and fastest way to just solve the
issue, including just cancelling the VOTE if we have consensus.

Hope we avoid spending time on this longer.

Thanks,
Jungtaek Lim (HeartSaVioR)

Russell Jurney - Thursday 13 March 2025 07:08:12 GMT
Sure, +1 non-binding.

Yang Jie - Thursday 13 March 2025 07:40:19 GMT
forgot to mention in my last reply, my stance is +1

Jie Yang

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Jungtaek Lim - Thursday 13 March 2025 07:54:39 GMT
Thanks to everyone who participated and voted!

Now I can technically conclude the VOTE, but I'm willing to wait till US
daytime tomorrow, to give some time for Dongjoon to revisit this.

I'll conclude the vote around 6PM PST tomorrow regardless of his vote. It's
ideal to see us have no -1, but having one -1 doesn't block this vote and
we can move forward.

Dongjoon Hyun - Thursday 13 March 2025 14:15:32 GMT
We are having this vote to give clarity by keeping all records of the
community decisions and stances during building a community consensus. All
votes are important and counted.

To Jungtaek, I already casted my veto properly and have been tracking the
thread. You don't need to say to me to revisit because I've been here.

To Xiao, in the history of Apache Spark, have we ever made a mistake to
ship a vendor-ownership like `spark.databricks.*`? I believe you are
switching the real root cause and the bad consequence here.

the next maintenance release before moving to a new feature or major
release?

Thanks,
Dongjoon.

Sean Owen - Thursday 13 March 2025 14:28:11 GMT
I'm not sure if a VOTE is appropriate here, but I also do not see any valid
technical objection here. I don't think this can be considered a valid
'veto' even if we were thinking of it that way.
I think there are other non-technical factors influencing this position. I
believe we proceed with Jungtaek's proposal.

Mark Hamstra - Thursday 13 March 2025 15:20:21 GMT
Valid -1 votes are not restricted to technical objections.

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Jungtaek Lim - Thursday 13 March 2025 20:57:01 GMT
I do believe there are two ways of considering -1 vote. Valid -1 votes are
not restricted to technical objections, but in that case, it must not be
considered as veto, otherwise we will end up disturbing ourselves. It is
just an ideal world where we can make consensus on any topic, no, it can't
be.

Please give me the evidence if you think -1 should be considered as veto,
otherwise I'll conclude the vote sooner.

Jungtaek Lim - Thursday 13 March 2025 21:22:18 GMT
I understand the situation of "agree to disagree", but I don't believe veto
can be used like that. It's just -1 from the minority.

Here are questions I want to get answered.

Where is the evidence that having a vendor name in the codebase is
violating ASF policy? Again, I see "Apple" to be used as a vendor name in
the field name. It is definitely not used as a common noun. What's your
call on this? Why do we keep saying where there is evidence and we don't
see any?

Why didn't you just say we must remove the migration logic the first time
we talked about this (unlike you did say there are "two" approaches, link
<https://github.com/apache/spark/pull/49983#issuecomment-2676531485>)? This
is a major issue for me as you gave false hope that you seem to think
option 1 is also a valid one, and I thought I can persuade you as long as I
show you people's opinion.

Why is it OK to ship the migration logic in Spark 3.5.5+ in Spark 3.5.x
line if you think this is really bad? I don't think it's really a long time
to make the effort of upgradability to take effect. Will we ever release
Spark 3.5.20 or so?

Why do you think your approach doesn't need to pass with VOTE, while in
this VOTE you are the only one disagreeing with the other approach? Is it
just that the current code is automatically achieving your goal? I believe
this makes no sense.

You should answer at least the last one, otherwise I'll have to cast -1 on
Spark 4.0.0 release till this is resolved, because we have the logic in the
current codebase where we do not have a consensus and I disagree with
shipping this.

Jungtaek Lim - Thursday 13 March 2025 21:30:57 GMT
Also, I don't believe considering -1 as veto makes sense here, because his
proposal is "somehow" (I'd rather say "accidentally") in the current
codebase and we hadn't had any discussion with that proposal. So if we kill
the VOTE and do nothing, it's effectively saying +1 to his proposal, which
makes zero sense to me.

Jungtaek Lim - Friday 14 March 2025 01:58:27 GMT
Likewise I said, I'm concluding the VOTE since we ensure the criteria (3 +1
binding, 1 -1 binding, and also +1s from non-binding).

I don't consider -1 as a veto as I explained, as we should have multiple
-1s if we go for VOTE with the current codebase. (+1 in this proposal is
effectively -1 in another proposal.)

The vote followed the Apache Voting Process
<https://www.apache.org/foundation/voting.html> with the type of
"package release" (which we tend to use in dev@ for VOTE). I guess it could
have also done with "procedural issues" which is less strict, but then this
fulfills both types of votes which should be OK.

The current codebase is "accidentally" representing another proposal and it
is never intended. I don't find the way I can -1 to the current codebase,
and make a different change neither bound to any proposal to be fair.

I don't want to block the release because of the above. So, let's change
the current codebase the way we discussed and voted here. Reverting this
decision should require another VOTE.

Thanks to everyone who voted!

Mark Hamstra - Friday 14 March 2025 02:34:27 GMT
Absolutely not!

This is clearly a vote on a code change, not on a procedural issue or
a package release. The code change has been vetoed by a -1 vote by a
qualified voter.

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Mridul Muralidharan - Friday 14 March 2025 06:31:33 GMT
I agree with Mark, imo this is a qualified veto.
We should give Dongjoon the opportunity to give his clarification, if any.

I do realize this delays the RC process, but this deserves to be looked
into carefully.

Thanks,
Mridul

Mridul Muralidharan - Friday 14 March 2025 06:39:46 GMT
FWIW, I am +1 on the proposal (though I missed the vote on this !)

Regards,
Mridul

Dongjoon Hyun - Friday 14 March 2025 08:32:18 GMT
Thank you all.

The vote is finished in an intended way with the expected result. We have
enough time to discuss and I have been sticking to my original
technical justification from the beginning (including this).

1. Helping renaming the conf via SPARK-51172 (by approving it)
2. Banning `spark.databricks.*` via SPARK-51173 (by adding `configName`
Scalastyle rule)
3. Led the discussion thread and reached the agreement to release Spark
3.5.5 early.
4. Releasing 3.5.5 as a release manager to provide a candidate migration
path
5. Proposing to use the migration path

This vote was Step 5. My technical point has always been aiming to recover
the Apache Spark 4 codebase to the status before our mistake by containing
the issue only in `branch-3.5` and providing the proposed narrow migration
path. And, as mentioned already, that's the situation where we were during
the vote at Apache Spark AS-IS branches. What all of us agree on is that
the previous code base is okay. I didn't reply to Jungtaek's Apple comment
intentionally because it's not a public Spark-vendor like Databricks. And,
it's a product name of the popular consumer electronic devices like
Intel/AMD/Graviton. In addition, I don't think we are going to add back
`spark.databricks.*` because of the reason the customers ask for it. In the
same way, this vote is one of the political decision making processes of
Apache Spark PMC. We started this vote because we couldn't make a consensus.

I believe I've been providing all my best to the Apache Spark community by
actions and with valid technical clarification (without no modification
during the process).

Sincerely,
Dongjoon

Jungtaek Lim - Friday 14 March 2025 09:27:28 GMT
Thanks for the update.

Though I have to clarify that "What all of us agree on is that the previous
code base is okay." is not true.

Wenchen summarized what happened in other thread which I think it's more
proper, like following:

1. A mistake was made, leading to a vendor name being included in the
configuration released in Spark 3.5.4.
2. Dongjoon initiated a vote to deprecate the incorrect configuration name
in 3.5.5, and the vote passed. Thanks to Dongjoon, 3.5.5 was released
shortly after.
3. A PR <https://github.com/apache/spark/pull/49897> that simply renamed
(rather than deprecated) the configuration was merged into master/4.0. This
is a breaking change and was not backed by a vote.
4. This vote concerns adding migration logic to prevent the breaking change
from affecting streaming queries.

One thing we have to make clear is, the PR
<https://github.com/apache/spark/pull/49897> that simply renamed was
submitted "earlier" (Feb 12 in KST) than the PR
<https://github.com/apache/spark/pull/49985> for migration (Feb 17 in KST).
The former PR was even merged on Feb 12, but we definitely had a
conversation to figure out the way to mitigate better at that time, because
it is definitely a breaking change like Wenchen said. That said, it was
just a quick fix and it warranted demanding followup work.

That is why I came up with the migration logic, and I filed PR for
migration logic to "3.5/4.0/master" (not only 3.5), which definitely
implies I was/am intended to resolve the issue in all branches. The
decision to merge "only" to 3.5 is definitely not made among "us". It was
decided to merge to 3.5 among us, but no, the DISCUSSION I raised was the
first time we talked about master/4.0 in public. "We never had a consensus
for master/4.0", and it was me who drove the discussion for that.

This is a root reason we had such a long argument, so we need to make this
very clear. No, I don't think I said I'm OK with not having migration code
in master/4.0.

Also, I agree that Apple is not a vendor productionizing Spark, but my
overall point is, we "just" feel like it's not good to have the vendor name
in the codebase as the ASF project needs to try to be vendor neutral. "ASF
project should be vendor neutral" is interpreted by everyone in every
different way, but there is no evidence we have consensus that having a
vendor name in any arbitrary string is problematic. Having a vendor name in
the string doesn't mean anything except migration. No, we put it just
because we think it makes users' life better. That's all, no other reason
like political one, I can confidently say no, it's not. And people seem to
agree based on the outcome of DISCUSS and VOTE.

The main question was, "where is the evidence it's safe to force users to
upgrade to Spark 3.5.5 before upgrading to Spark 4.0.0". It came up from
figuring out the resolution of the issue (I get where you are coming from),
but it is missing the big question about who will have a pain point, and I
did ask the community and I heard they are supportive to just allow
upgrading to Spark 4.0.0 directly. I really think the proposal was not
backed by consensus. It was driven solely by one person - it is backed by
willingness to remove the occurence of including vendor name in the
codebase ASAP. Again, everyone has every different way of interpretation
about the vendor name issue, so this should have been discussed before, to
weigh on the cost of having the vendor name. It shouldn't be something
someone solely just makes a decision by oneself.

I'm not sure how this comes to a valid technical objection, because we
never discussed that approach in public, and the opposite approach
definitely gained traction in public. We never discussed that approach and
the codebase is already reflecting this, which is arguably a disaster. I
just wanted to fix that. That's all.

Jungtaek Lim - Friday 14 March 2025 10:01:34 GMT
If we were not intended to block the VOTE but just to express the
disagreement, please say "-1" instead of representing it as "veto". When
saying veto, you intend to kill the process unless you are not persuaded or
you are not having proper technical justification.

Jungtaek Lim - Friday 14 March 2025 10:03:13 GMT
Sorry, I was missing the type of the vote - this totally depends on the
type of the vote. If we weren't intended to block the VOTE which could have
been interpreted as code change, maybe -0 or -0.5 or -0.99 should have been
used rather than -1 to block the process.

Jungtaek Lim - Friday 14 March 2025 10:23:11 GMT
That said, if I understand correctly, you weren’t intended to “block” the
vote, right? You say you expected the vote to be finished.

Could you please cast the vote to -0.x since some people views this as code
change vote, or clarify explicitly that you think this is not a code change
vote? This will help resolve the concerns from some PMC members about how
we should interpret the vote result clearly.

Thanks!

2025년 3월 14일 (금) 오후 5:33, Dongjoon Hyun <[email protected]>님이 작성:

Dongjoon Hyun - Saturday 15 March 2025 00:23:21 GMT
Apache Spark PMC always strongly recommends all 3.5 users to upgrade to the
latest stable release via the official website. The main question seems
quite different from the Apache Spark website. May I ask what is not safe
to guide Spark 3.5.4 users to 3.5.5, Jungtaek?

upgrade to Spark 3.5.5...

For the following part, when the Apache Spark community made a mistake at
Spark 2.4.2 release, we guided the users to upgrade to 2.4.3 immediately
after recovering the default Scala version to 2.11.

2019-04-23 https://spark.apache.org/releases/spark-release-2-4-2.html
2019-05-08 https://spark.apache.org/releases/spark-release-2-4-3.html

In the same way, Apache Spark 3.5.5 was released and is ready to handle a
mistake at Spark 3.5.4.

2025-02-27 https://spark.apache.org/releases/spark-release-3-5-5.html

For the vote, the vote is a time-limited procedure to make a swift
decision. That's the reason why you proposed the vote procedure and we
agreed. There is no way to `block` the votes. The vote itself is already
completed (including my -1).

I've been considering this as a part of the whole `spark.databricks.*`
incident handling. In my interpretation, according to the ASF process, the
Apache Spark community made the conclusion to unblock the Apache Spark
4.0.0 release with the AS-IS code with the improved Spark 4.0 migration
guide because I provided a technical justification for my vote via the
concrete alternative based on the existing Spark 3.5.5, AS-IS code base,
and the suggested better migration guide way in order to eliminate the
affected streaming queries.

Thanks,
Dongjoon.

Jungtaek Lim - Saturday 15 March 2025 01:06:22 GMT
Dongjoon,

Please look at what I got from the community.

It is quite different whether "we strongly recommend" vs "we force". We
never have an agreement that the latter would make sense. The DISCUSSION
and VOTE were to gain consensus that the latter does not make sense, and I
think we get consensus except you. Again, I'm not saying upgrading Spark
3.5.4 to Spark 3.5.5 is unsafe. I'm saying, we can't force users to upgrade
a specific version once to upgrade the other version. This is never the
same with the argument of safeness for upgrading Spark 3.5.4 to Spark
3.5.5, because the latter implies we control users to achieve our goal.

I believe I have said multiple times that we don't have control of users. I
think that is a huge misunderstanding if we ever say we have control of
users, so I had to prove that with DISCUSSION. The only time we are forcing
users to upgrade is when we have day-0 security issues "which is really bad
for users" (again, not to achieve our goal), but even for that case, we
provide upgrade paths for every active branch whenever it is feasible. This
is a huge difference from what you were proposing.

I understand the disagreement of this led to a long standing
discussion/debate, and I hope I showed what the community thinks about it,
so that we don't have to do the same discussion again. I really want to
remind everyone, any trial to control users wouldn't lead to a better
result, so we should have weighed the tradeoff in public before doing that.
It's not successful even if users do not complain explicitly. They just
accumulate reasons by themselves to stop using Apache Spark.

Mark / Mrudul,

Dongjoon agreed that this is a procedural vote, so -1 is not a "veto". I'll
update the result thread. Hope we now resolve the concern about the vote.

Jungtaek Lim - Saturday 15 March 2025 14:18:41 GMT

conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code
with the improved Spark 4.0 migration guide because I provided a technical
justification for my vote via the concrete alternative based on the
existing Spark 3.5.5, AS-IS code base, and the suggested better migration
guide way in order to eliminate the affected streaming queries.

I can always be corrected if you give the evidence. Let's stop "just"
talking. I believe we are seeing quite different things and our memory is
quite opposite. "History will tell us."

I am trying to understand where the miscommunication came from. Some
clarification:

1. I believe I have said I do not agree just removing the config in
master/4.0 and I expected follow-up, which is the migration logic. (link
<https://github.com/apache/spark/pull/49897#issuecomment-2652486115>) I
admit this is a bit unclear to understand, but I had multiple times to make
my voice clear, otherwise I shouldn't ever have migration logic PR for
master/4.0.
2. I believe I have said my intention is to land the migration logic to
4.0.x and arguably longer (link
<https://lists.apache.org/thread/q24vonqhvqh11ghd488rctsm89zvmpqd>).
I think there were people who were wanting to remove the vendor name in
any way, but arguably it just ended with an open question, never to be
reached consensus. People expressed concerns, but nothing was concluded
except we agree with proceeding for Spark 3.5.5. We never made a consensus
on how to deal with it in Spark 4.0.0+ in that discussion thread,
especially about migration logic.
3. VOTE for removal of config is clearly stated that it is only 3.5. (link
<https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5>)
4. I claimed I wanted to proceed with migration logic for branch-4.0 PR,
and hadn't got any feedback except being told to wait for Spark 3.5.5
(link). If you weren't open to my proposal, you should have just said "we
were already decided" and you had to give the evidence. I haven't heard
any, so I had to initiate DISCUSS.
5. We all know about DISCUSS and VOTE so I wouldn't repeat.

I have strong evidence that you were aware of the fact we never agreed with
the behavior for Spark 4.0.0, and you said my proposal is "technically
correct", so we had never debated about "technical objection", but debated
about "behavior".
https://github.com/apache/spark/pull/49983#issuecomment-2676531485

Can you please explain why you said my proposal is "technically correct"
and here you did a vote which required "technical objection"? Have you
changed your mind?

Overall, when you say "the Apache Spark community made the conclusion to
unblock the Apache Spark 4.0.0 release with the AS-IS code", I don't get
who is "the Apache Spark community". Where can I see the DISCUSS and VOTE
thread? Is it really that I am excluded on the list of the Apache Spark
community, while arguably I am the only active maintainer of the module?
Could you please enumerate who the Apache Spark community was at that time?

Let's not talk based on memory. If we agree about that, we should have a
history. I am open to apologize if I missed a critical discussion and vote.
Your (and my) memory should never be used as evidence. Please, give the
evidence.

I'm also happy to hear about the other thread I have made. Thanks.

Jungtaek Lim - Saturday 15 March 2025 14:21:23 GMT
small missing on link:

4. I claimed I wanted to proceed with migration logic for branch-4.0 PR,
and hadn't got any feedback except being told to wait for Spark 3.5.5 (link
<https://github.com/apache/spark/pull/49983#pullrequestreview-2621947671>).
If you weren't open to my proposal, you should have just said "we were
already decided" and you had to give the evidence. I haven't heard any, so
I had to initiate DISCUSS.

Jungtaek Lim - Saturday 15 March 2025 14:26:48 GMT
Dongjoon, it is your responsibility to clarify your vote position
since the vote is stalled as some people still claim your vote is veto. If
you are really agreeing that I gained the consensus in the proper way, and
your vote is really just for historical record, let's not waste more time
by explicitly casting -0.99.

If you say I misunderstood your reply and you are still casting a veto, I'm
happy to hear the evidence based on the history. We only talked from Github
PR and mailing list, so none of the discussion happened except that
infrastructure. I do not count any discussion happening in private@, as
private@ is not meant to be used for discussion which could have been done
in public.

Jungtaek Lim
[VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*`
configuration in Spark 4.0.x
The vote passes with 7 +1s (3 binding +1s) and 1 -1s (1 binding -1s). Thanks to
all who helped with the vote! I'm going to make a code change in branch-4.0
quickly so that we don't have to trigger an...
4 182025-03-15 13:19 +00:00Rob Reeves
Code formatting tech debt
Hi Spark devs, There seems to be a lot of code formatting tech debt. When I run
"./dev/scalafmt" on the master branch it makes formatting changes on thousands
of files. Is that expected or am I doing...
2 12025-03-15 07:08 +00:00Qi Tan
[Discuss] SPIP: Support NanoSecond Timestamps
Hello everybody, I would like to start a discussion on SPARK-50532
<https://issues.apache.org/jira/browse/SPARK-50532> to enable Spark to support
nanoseconds. Here attached the spip doc <https://docs...
5 52025-03-15 06:25 +00:00Anton Okolnychyi
Re: [DISCUSS] SPIP: Constraints in DSv2
Thanks to everyone who provided feedback and participated in the discussion! I
made some tweaks to the proposal and submitted a PR with the DSv2 API changes:
https://github.com/apache/spark/pull/5025...
1 02025-03-13 00:49 +00:00Hyukjin Kwon
Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and
disable affected tests
If we should fix, let's make sure we don't just disable the tests - we will
create another set of technical debt. On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad
<[email protected]> wrote: > I’ll...
3 22025-03-12 16:14 +00:00Rozov, Vlad
[DISCUSS] Upgrade Hive compile time dependency to 4.0
Hi All, As Apache Hive announced EOL for Hive 2.x [1] and 3.x [2], should Spark
be compiled against Hive 4.x and use it as default? Thank you, Vlad [1]
https://lists.apache.org/thread/4ctrzfw60jkh...
3 32025-03-12 14:49 +00:00Rozov, Vlad
Re: PR review
May I please get review on the following outstanding PRs:
https://github.com/apache/spark/pull/49276 (open on 12/23/2024)
https://github.com/apache/spark/pull/49870 Thank you, Vlad On Feb...
1 02025-03-11 15:58 +00:00Jungtaek Lim
[DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark
4.0.0+
Hi dev, This is a spin-up of the original thread "Deprecating and banning
`spark.databricks.*` config from Apache Spark repository". (link
<https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0nd...
8 262025-03-11 09:14 +00:00Wenchen Fan
[VOTE] Release Spark 4.0.0 (RC2)
Please vote on releasing the following candidate as Apache Spark version 4.0.0.
The vote is open until March 5 (PST) and passes if a majority +1 PMC votes are
cast, with a minimum of 3 +1 votes. [ ]...
7 112025-03-11 02:35 +00:00Dongjoon Hyun
[DISCUSS] New Spark Connect Client repository for Swift language
Hi, All. I'd like to propose to add a new Apache Spark repository for `Spark
Connect Client for Swift` in Apache Spark 4.1.0 timeframe.
https://github.com/apache/spark-connect-swift To do this, I c...
17 222025-03-11 02:27 +00:00dongjoon-hyun (via GitHub)
[PR] [SPARK-51458] Add GitHub Action job to check ASF license
[spark-connect-swift]
dongjoon-hyun opened a new pull request, #2: URL:
https://github.com/apache/spark-connect-swift/pull/2 (no comment) -- This is an
automated message from the Apache Git Service. To respond to t...
1 12025-03-11 00:03 +00:00dongjoon-hyun (via GitHub)
[PR] Initial Implementation [spark-connect-swift]
dongjoon-hyun opened a new pull request, #1: URL:
https://github.com/apache/spark-connect-swift/pull/1 (no comment) -- This is an
automated message from the Apache Git Service. To respond to t...
1 02025-03-10 23:45 +00:00Max Gekk
Contribution to Spark SQL: new data type TIME
Hi Spark devs, I would like to invite you to develop the new data type TIME in
Spark SQL. At the moment, there are > 10 sub-tasks in the umbrella JIRA:
https://issues.apache.org/jira/browse/SPARK-511...
3 22025-03-06 17:44 +00:00

Powered by Apache Pony Mail (Foal v/1.0.1 ~78ad7bf). For data privacy requests,
please contact: [email protected]. For questions about this service, please
contact: [email protected].

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Reply via email to