I agree with Mark, imo this is a qualified veto. We should give Dongjoon the opportunity to give his clarification, if any.
I do realize this delays the RC process, but this deserves to be looked into carefully. Thanks, Mridul On Thu, Mar 13, 2025 at 9:35 PM Mark Hamstra <markhams...@gmail.com> wrote: > Absolutely not! > > This is clearly a vote on a code change, not on a procedural issue or > a package release. The code change has been vetoed by a -1 vote by a > qualified voter. > > On Thu, Mar 13, 2025 at 6:58 PM Jungtaek Lim > <kabhwan.opensou...@gmail.com> wrote: > > > > Likewise I said, I'm concluding the VOTE since we ensure the criteria (3 > +1 binding, 1 -1 binding, and also +1s from non-binding). > > > > I don't consider -1 as a veto as I explained, as we should have multiple > -1s if we go for VOTE with the current codebase. (+1 in this proposal is > effectively -1 in another proposal.) > > > > The vote followed the Apache Voting Process with the type of "package > release" (which we tend to use in dev@ for VOTE). I guess it could have > also done with "procedural issues" which is less strict, but then this > fulfills both types of votes which should be OK. > > > > The current codebase is "accidentally" representing another proposal and > it is never intended. I don't find the way I can -1 to the current > codebase, and make a different change neither bound to any proposal to be > fair. > > > > I don't want to block the release because of the above. So, let's change > the current codebase the way we discussed and voted here. Reverting this > decision should require another VOTE. > > > > Thanks to everyone who voted! > > > > On Thu, Mar 13, 2025 at 4:54 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> > >> Thanks to everyone who participated and voted! > >> > >> Now I can technically conclude the VOTE, but I'm willing to wait till > US daytime tomorrow, to give some time for Dongjoon to revisit this. > >> > >> I'll conclude the vote around 6PM PST tomorrow regardless of his vote. > It's ideal to see us have no -1, but having one -1 doesn't block this vote > and we can move forward. > >> > >> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <yangji...@apache.org> wrote: > >>> > >>> forgot to mention in my last reply, my stance is +1 > >>> > >>> Jie Yang > >>> > >>> On 2025/03/13 07:08:12 Russell Jurney wrote: > >>> > Sure, +1 non-binding. > >>> > > >>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> > >>> > wrote: > >>> > > >>> > > Russell, > >>> > > > >>> > > Of course, we hear people' voices who aren't having binding votes > as well. > >>> > > Personally I think it's more important than committers/PMC > members' VOTE > >>> > > this time since we can be biased and be far from user experience. > >>> > > > >>> > > Could you please explicitly cast your vote, like +1 (non-binding)? > You > >>> > > seem to agree with the proposal. Thanks! > >>> > > > >>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney < > russell.jur...@gmail.com> > >>> > > wrote: > >>> > > > >>> > >> I'm just a lurker and aspiring contributor, but as a Spark user > upgrading > >>> > >> twice is very confusing and would cause many or most users to > fail to > >>> > >> upgrade successfully to Spark 4 on a first go. That seems like a > very bad > >>> > >> user experience. I thought it was worthwhile stating this out > loud. > >>> > >> > >>> > >> Russell > >>> > >> > >>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <gatorsm...@gmail.com> > wrote: > >>> > >> > >>> > >>> this vote is to allow streaming queries which had been ever run > in Spark > >>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to be > upgraded with > >>> > >>>> Spark 3.5.5+ in prior". > >>> > >>> > >>> > >>> > >>> > >>> In the history of Apache Spark, have we ever required users to > upgrade > >>> > >>> to the next maintenance release before moving to a new feature > or major > >>> > >>> release? > >>> > >>> > >>> > >>> Xiao > >>> > >>> > >>> > >>> Adam Binford <adam...@gmail.com> 于2025年3月11日周二 09:08写道: > >>> > >>> > >>> > >>>> +1 (non-binding) > >>> > >>>> > >>> > >>>> It's a pretty in the weeds issue with how Structured Streaming > works > >>> > >>>> under the hood that's kinda hard to understand if you're not > familiar with > >>> > >>>> it. The migration logic doesn't mean users can still use the > old config, > >>> > >>>> it's purely behind the scenes to fix checkpoint metadata in > streams created > >>> > >>>> in 3.5.4. The 5 lines of code it takes to address a weird edge > case for > >>> > >>>> certain users that's already gone from master shouldn't be a > huge deal. > >>> > >>>> > >>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <yangji...@apache.org> > wrote: > >>> > >>>> > >>> > >>>>> > >>> > >>>>> To Sean, you're right, I'm very sorry. > >>> > >>>>> > >>> > >>>>> From the perspective of compatibility and migratability, I > think we > >>> > >>>>> should migrate this logic to 4.0.0 and keep it in the codebase > for a longer > >>> > >>>>> time (or permanently), because we can't predict which version > users of > >>> > >>>>> 3.5.4 will choose next. > >>> > >>>>> > >>> > >>>>> > >>> > >>>>> I don't want to discuss the so-called vendor issue. > >>> > >>>>> > >>> > >>>>> I withdraw my previous -1. > >>> > >>>>> > >>> > >>>>> Jie Yang. > >>> > >>>>> > >>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote: > >>> > >>>>> > Guys, let’s be honest about what we’re discussing here. > >>> > >>>>> > > >>> > >>>>> > If this is a migration issue, why would we even need a vote? > We’ve > >>> > >>>>> been > >>> > >>>>> > consistently adding configurations to restore legacy behavior > >>> > >>>>> instead of > >>> > >>>>> > removing them because we understand the challenges of > upgrading Spark > >>> > >>>>> > versions. Our goal has always been to make upgrades easier, > even if > >>> > >>>>> it > >>> > >>>>> > means carrying some technical debt. I don’t think we want to > change > >>> > >>>>> that > >>> > >>>>> > culture now. > >>> > >>>>> > > >>> > >>>>> > If the concern is about vendor names appearing in the > codebase, then > >>> > >>>>> why is > >>> > >>>>> > it a big deal this time when vendor names are already present > >>> > >>>>> elsewhere? If > >>> > >>>>> > we’ve failed to follow a policy, let’s correct it, but can > someone > >>> > >>>>> point to > >>> > >>>>> > the specific policy we’re violating? > >>> > >>>>> > > >>> > >>>>> > If the vote is about adding migration logic to ease the > upgrade from > >>> > >>>>> 3.5.4 > >>> > >>>>> > to 4.0.0, then +1, why not? > >>> > >>>>> > > >>> > >>>>> > Thanks, > >>> > >>>>> > Wenchen > >>> > >>>>> > > >>> > >>>>> > > >>> > >>>>> > > >>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim < > >>> > >>>>> kabhwan.opensou...@gmail.com> > >>> > >>>>> > wrote: > >>> > >>>>> > > >>> > >>>>> > > Well said, Sean. Sorry I made you keep around here since > it might > >>> > >>>>> not be > >>> > >>>>> > > clearly stated. My bad. > >>> > >>>>> > > > >>> > >>>>> > > Yang, how could we ever tolerate the fact there are "other" > >>> > >>>>> occurrences of > >>> > >>>>> > > vendor names in the codebase? Please go and search > "databricks" in > >>> > >>>>> the > >>> > >>>>> > > codebase and be surprised. > >>> > >>>>> > > > >>> > >>>>> > > If we believe that having vendor names in the codebase will > >>> > >>>>> increase > >>> > >>>>> > > the occurrence of making mistakes, why didn't we have a > discussion > >>> > >>>>> thread > >>> > >>>>> > > earlier to remove all occurrences altogether? This is > super tricky > >>> > >>>>> because > >>> > >>>>> > > I can even start to argue we have "Apple" as a vendor name > in > >>> > >>>>> Apache Spark > >>> > >>>>> > > codebase. I'm not saying we use "apple" in the test data. > See > >>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? No, > >>> > >>>>> `isMacOnMSeries` or > >>> > >>>>> > > `isMacOnSilicon` is enough. > >>> > >>>>> > > > >>> > >>>>> > > We really need to draw a line where we disallow vendor > names on it > >>> > >>>>> - if > >>> > >>>>> > > it's the entire codebase, I don't really think it is > realistic. > >>> > >>>>> > > > >>> > >>>>> > > This was really a mistake, and it was definitely not from > >>> > >>>>> referring to the > >>> > >>>>> > > existing codebase. Not having a vendor name does not change > >>> > >>>>> anything on the > >>> > >>>>> > > chance of encountering this issue again. If we really > care, we > >>> > >>>>> should think > >>> > >>>>> > > about style checking, which is the only viable way to > catch the > >>> > >>>>> mistake. > >>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor names > in that > >>> > >>>>> style > >>> > >>>>> > > check, not just the problematic vendor name. > >>> > >>>>> > > > >>> > >>>>> > > > >>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen < > sro...@gmail.com> > >>> > >>>>> wrote: > >>> > >>>>> > > > >>> > >>>>> > >> Doesn't the migration code 'clear' the debt? > >>> > >>>>> > >> The proposal is not to continue to support the config. > >>> > >>>>> > >> I feel like people are not quite understanding the > change, and > >>> > >>>>> objecting > >>> > >>>>> > >> to something that doesn't exist. > >>> > >>>>> > >> It's a shame, as this seems like something not even worth > >>> > >>>>> discussing. I > >>> > >>>>> > >> don't know why this triggered this much discussion. We > have kept > >>> > >>>>> deprecated > >>> > >>>>> > >> methods without blinking, which is in comparison much > bigger. > >>> > >>>>> > >> Can we maybe ask you review the actual change in question? > >>> > >>>>> > >> > >>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie < > yangji...@apache.org> > >>> > >>>>> wrote: > >>> > >>>>> > >> > >>> > >>>>> > >>> -1 > >>> > >>>>> > >>> Remove migration logic of incorrect `spark.databricks.*` > >>> > >>>>> configuration > >>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration was > initially > >>> > >>>>> introduced > >>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear design > intent. > >>> > >>>>> Although the > >>> > >>>>> > >>> immediate maintenance cost of retaining this > configuration > >>> > >>>>> currently seems > >>> > >>>>> > >>> limited, as subsequent versions iterate and user habits > form, it > >>> > >>>>> may lead > >>> > >>>>> > >>> to the continuous accumulation of technical debt. When > users > >>> > >>>>> come to view > >>> > >>>>> > >>> this configuration as one that can be relied on > long-term, > >>> > >>>>> future removal > >>> > >>>>> > >>> may face greater resistance from users and could > potentially > >>> > >>>>> become an > >>> > >>>>> > >>> entrenched and redundant configuration in the codebase. > >>> > >>>>> Therefore, promptly > >>> > >>>>> > >>> correcting this historically accidental configuration > not only > >>> > >>>>> maintains > >>> > >>>>> > >>> the normativity of the Spark configuration system but > also > >>> > >>>>> prevents > >>> > >>>>> > >>> unintended configurations from becoming de facto > standards, > >>> > >>>>> thereby > >>> > >>>>> > >>> reducing long-term maintenance risks. > >>> > >>>>> > >>> > >>> > >>>>> > >>> Jie Yang > >>> > >>>>> > >>> > >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote: > >>> > >>>>> > >>> > -1 because there exists a feasible migration path for > Apache > >>> > >>>>> Spark > >>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5. > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > It's obvious that this Databricks' mistake already > causes a > >>> > >>>>> huge > >>> > >>>>> > >>> communication cost in the Apache Spark community and is > >>> > >>>>> suggesting a burden > >>> > >>>>> > >>> to enforce us to handle at least two more PRs at 4.0.0 > and 4.1.0. > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > Given that, I don't think > >>> > >>>>> > >>> > - This is an inevitable or > >>> > >>>>> > >>> > - This is 0 cost > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > Dongjoon. > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote: > >>> > >>>>> > >>> > > Starting from my +1 (non-binding). > >>> > >>>>> > >>> > > > >>> > >>>>> > >>> > > In addition, I propose to retain migration logic > till Spark > >>> > >>>>> 4.1.x and > >>> > >>>>> > >>> > > remove it in Spark 4.2.0. > >>> > >>>>> > >>> > > > >>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim < > >>> > >>>>> > >>> kabhwan.opensou...@gmail.com> > >>> > >>>>> > >>> > > wrote: > >>> > >>>>> > >>> > > > >>> > >>>>> > >>> > > > Hi dev, > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > Please vote to retain migration logic of incorrect > >>> > >>>>> > >>> `spark.databricks.*` > >>> > >>>>> > >>> > > > configuration in Spark 4.0.x. > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > - DISCUSSION: > >>> > >>>>> > >>> > > > > >>> > >>>>> > https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr > >>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* config being > >>> > >>>>> exposed in > >>> > >>>>> > >>> 3.5.4 in > >>> > >>>>> > >>> > > > Spark 4.0.0+) > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > Specifically, please review this post > >>> > >>>>> > >>> > > > > >>> > >>>>> > https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k > >>> > >>>>> > >>> which > >>> > >>>>> > >>> > > > explains pros and cons about the proposal - > proposal is > >>> > >>>>> about > >>> > >>>>> > >>> "Option 1". > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > Simply speaking, this vote is to allow streaming > queries > >>> > >>>>> which had > >>> > >>>>> > >>> been > >>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with Spark > 4.0.x, > >>> > >>>>> "without > >>> > >>>>> > >>> having to > >>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If the > vote > >>> > >>>>> passes, we > >>> > >>>>> > >>> will help > >>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark 3.5.4 to > Spark > >>> > >>>>> 4.0.x, > >>> > >>>>> > >>> which would > >>> > >>>>> > >>> > > > be almost 1 year. > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > The (only) cons in this option is having to retain > the > >>> > >>>>> incorrect > >>> > >>>>> > >>> > > > configuration name as "string" in the codebase a > bit > >>> > >>>>> longer. The > >>> > >>>>> > >>> code > >>> > >>>>> > >>> > > > complexity of migration logic is arguably trivial. > (link > >>> > >>>>> > >>> > > > < > >>> > >>>>> > >>> > >>> > >>>>> > https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183 > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > > > ) > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone > supports > >>> > >>>>> including > >>> > >>>>> > >>> migration > >>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please cast > +1 here > >>> > >>>>> and leave > >>> > >>>>> > >>> the > >>> > >>>>> > >>> > > > desired last minor version of Spark to retain this > >>> > >>>>> migration logic. > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > The vote is open for the next 72 hours and passes > if a > >>> > >>>>> majority +1 > >>> > >>>>> > >>> PMC > >>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes. > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect > >>> > >>>>> `spark.databricks.*` > >>> > >>>>> > >>> > > > configuration in Spark 4.0.x > >>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect > >>> > >>>>> `spark.databricks.*` > >>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because... > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > Thanks! > >>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR) > >>> > >>>>> > >>> > > > > >>> > >>>>> > >>> > > > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > > >>> > >>>>> > --------------------------------------------------------------------- > >>> > >>>>> > >>> > To unsubscribe e-mail: > dev-unsubscr...@spark.apache.org > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > > >>> > >>>>> > >>> > >>> > >>>>> > >>> > >>> > >>>>> > --------------------------------------------------------------------- > >>> > >>>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > >>>>> > >>> > >>> > >>>>> > >>> > >>> > >>>>> > > >>> > >>>>> > >>> > >>>>> > --------------------------------------------------------------------- > >>> > >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > >>>>> > >>> > >>>>> > >>> > >>>> > >>> > >>>> -- > >>> > >>>> Adam Binford > >>> > >>>> > >>> > >>> > >>> > > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >