Valid -1 votes are not restricted to technical objections. On Thu, Mar 13, 2025 at 7:28 AM Sean Owen <sro...@gmail.com> wrote: > > I'm not sure if a VOTE is appropriate here, but I also do not see any valid > technical objection here. I don't think this can be considered a valid 'veto' > even if we were thinking of it that way. > I think there are other non-technical factors influencing this position. I > believe we proceed with Jungtaek's proposal. > > On Thu, Mar 13, 2025 at 9:17 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >> >> We are having this vote to give clarity by keeping all records of the >> community decisions and stances during building a community consensus. All >> votes are important and counted. >> >> To Jungtaek, I already casted my veto properly and have been tracking the >> thread. You don't need to say to me to revisit because I've been here. >> >> To Xiao, in the history of Apache Spark, have we ever made a mistake to ship >> a vendor-ownership like `spark.databricks.*`? I believe you are switching >> the real root cause and the bad consequence here. >> > In the history of Apache Spark, have we ever required users to upgrade to >> > the next maintenance release before moving to a new feature or major >> > release? >> >> Thanks, >> Dongjoon. >> >> >> On Thu, Mar 13, 2025 at 12:58 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> >> wrote: >>> >>> Thanks to everyone who participated and voted! >>> >>> Now I can technically conclude the VOTE, but I'm willing to wait till US >>> daytime tomorrow, to give some time for Dongjoon to revisit this. >>> >>> I'll conclude the vote around 6PM PST tomorrow regardless of his vote. It's >>> ideal to see us have no -1, but having one -1 doesn't block this vote and >>> we can move forward. >>> >>> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <yangji...@apache.org> wrote: >>>> >>>> forgot to mention in my last reply, my stance is +1 >>>> >>>> Jie Yang >>>> >>>> On 2025/03/13 07:08:12 Russell Jurney wrote: >>>> > Sure, +1 non-binding. >>>> > >>>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim >>>> > <kabhwan.opensou...@gmail.com> >>>> > wrote: >>>> > >>>> > > Russell, >>>> > > >>>> > > Of course, we hear people' voices who aren't having binding votes as >>>> > > well. >>>> > > Personally I think it's more important than committers/PMC members' >>>> > > VOTE >>>> > > this time since we can be biased and be far from user experience. >>>> > > >>>> > > Could you please explicitly cast your vote, like +1 (non-binding)? You >>>> > > seem to agree with the proposal. Thanks! >>>> > > >>>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney >>>> > > <russell.jur...@gmail.com> >>>> > > wrote: >>>> > > >>>> > >> I'm just a lurker and aspiring contributor, but as a Spark user >>>> > >> upgrading >>>> > >> twice is very confusing and would cause many or most users to fail to >>>> > >> upgrade successfully to Spark 4 on a first go. That seems like a very >>>> > >> bad >>>> > >> user experience. I thought it was worthwhile stating this out loud. >>>> > >> >>>> > >> Russell >>>> > >> >>>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <gatorsm...@gmail.com> wrote: >>>> > >> >>>> > >>> this vote is to allow streaming queries which had been ever run in >>>> > >>> Spark >>>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to be >>>> > >>>> upgraded with >>>> > >>>> Spark 3.5.5+ in prior". >>>> > >>> >>>> > >>> >>>> > >>> In the history of Apache Spark, have we ever required users to >>>> > >>> upgrade >>>> > >>> to the next maintenance release before moving to a new feature or >>>> > >>> major >>>> > >>> release? >>>> > >>> >>>> > >>> Xiao >>>> > >>> >>>> > >>> Adam Binford <adam...@gmail.com> 于2025年3月11日周二 09:08写道: >>>> > >>> >>>> > >>>> +1 (non-binding) >>>> > >>>> >>>> > >>>> It's a pretty in the weeds issue with how Structured Streaming works >>>> > >>>> under the hood that's kinda hard to understand if you're not >>>> > >>>> familiar with >>>> > >>>> it. The migration logic doesn't mean users can still use the old >>>> > >>>> config, >>>> > >>>> it's purely behind the scenes to fix checkpoint metadata in streams >>>> > >>>> created >>>> > >>>> in 3.5.4. The 5 lines of code it takes to address a weird edge case >>>> > >>>> for >>>> > >>>> certain users that's already gone from master shouldn't be a huge >>>> > >>>> deal. >>>> > >>>> >>>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <yangji...@apache.org> >>>> > >>>> wrote: >>>> > >>>> >>>> > >>>>> >>>> > >>>>> To Sean, you're right, I'm very sorry. >>>> > >>>>> >>>> > >>>>> From the perspective of compatibility and migratability, I think we >>>> > >>>>> should migrate this logic to 4.0.0 and keep it in the codebase for >>>> > >>>>> a longer >>>> > >>>>> time (or permanently), because we can't predict which version >>>> > >>>>> users of >>>> > >>>>> 3.5.4 will choose next. >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> I don't want to discuss the so-called vendor issue. >>>> > >>>>> >>>> > >>>>> I withdraw my previous -1. >>>> > >>>>> >>>> > >>>>> Jie Yang. >>>> > >>>>> >>>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote: >>>> > >>>>> > Guys, let’s be honest about what we’re discussing here. >>>> > >>>>> > >>>> > >>>>> > If this is a migration issue, why would we even need a vote? >>>> > >>>>> > We’ve >>>> > >>>>> been >>>> > >>>>> > consistently adding configurations to restore legacy behavior >>>> > >>>>> instead of >>>> > >>>>> > removing them because we understand the challenges of upgrading >>>> > >>>>> > Spark >>>> > >>>>> > versions. Our goal has always been to make upgrades easier, even >>>> > >>>>> > if >>>> > >>>>> it >>>> > >>>>> > means carrying some technical debt. I don’t think we want to >>>> > >>>>> > change >>>> > >>>>> that >>>> > >>>>> > culture now. >>>> > >>>>> > >>>> > >>>>> > If the concern is about vendor names appearing in the codebase, >>>> > >>>>> > then >>>> > >>>>> why is >>>> > >>>>> > it a big deal this time when vendor names are already present >>>> > >>>>> elsewhere? If >>>> > >>>>> > we’ve failed to follow a policy, let’s correct it, but can >>>> > >>>>> > someone >>>> > >>>>> point to >>>> > >>>>> > the specific policy we’re violating? >>>> > >>>>> > >>>> > >>>>> > If the vote is about adding migration logic to ease the upgrade >>>> > >>>>> > from >>>> > >>>>> 3.5.4 >>>> > >>>>> > to 4.0.0, then +1, why not? >>>> > >>>>> > >>>> > >>>>> > Thanks, >>>> > >>>>> > Wenchen >>>> > >>>>> > >>>> > >>>>> > >>>> > >>>>> > >>>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim < >>>> > >>>>> kabhwan.opensou...@gmail.com> >>>> > >>>>> > wrote: >>>> > >>>>> > >>>> > >>>>> > > Well said, Sean. Sorry I made you keep around here since it >>>> > >>>>> > > might >>>> > >>>>> not be >>>> > >>>>> > > clearly stated. My bad. >>>> > >>>>> > > >>>> > >>>>> > > Yang, how could we ever tolerate the fact there are "other" >>>> > >>>>> occurrences of >>>> > >>>>> > > vendor names in the codebase? Please go and search >>>> > >>>>> > > "databricks" in >>>> > >>>>> the >>>> > >>>>> > > codebase and be surprised. >>>> > >>>>> > > >>>> > >>>>> > > If we believe that having vendor names in the codebase will >>>> > >>>>> increase >>>> > >>>>> > > the occurrence of making mistakes, why didn't we have a >>>> > >>>>> > > discussion >>>> > >>>>> thread >>>> > >>>>> > > earlier to remove all occurrences altogether? This is super >>>> > >>>>> > > tricky >>>> > >>>>> because >>>> > >>>>> > > I can even start to argue we have "Apple" as a vendor name in >>>> > >>>>> Apache Spark >>>> > >>>>> > > codebase. I'm not saying we use "apple" in the test data. See >>>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? No, >>>> > >>>>> `isMacOnMSeries` or >>>> > >>>>> > > `isMacOnSilicon` is enough. >>>> > >>>>> > > >>>> > >>>>> > > We really need to draw a line where we disallow vendor names >>>> > >>>>> > > on it >>>> > >>>>> - if >>>> > >>>>> > > it's the entire codebase, I don't really think it is realistic. >>>> > >>>>> > > >>>> > >>>>> > > This was really a mistake, and it was definitely not from >>>> > >>>>> referring to the >>>> > >>>>> > > existing codebase. Not having a vendor name does not change >>>> > >>>>> anything on the >>>> > >>>>> > > chance of encountering this issue again. If we really care, we >>>> > >>>>> should think >>>> > >>>>> > > about style checking, which is the only viable way to catch the >>>> > >>>>> mistake. >>>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor names in >>>> > >>>>> > > that >>>> > >>>>> style >>>> > >>>>> > > check, not just the problematic vendor name. >>>> > >>>>> > > >>>> > >>>>> > > >>>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen <sro...@gmail.com> >>>> > >>>>> wrote: >>>> > >>>>> > > >>>> > >>>>> > >> Doesn't the migration code 'clear' the debt? >>>> > >>>>> > >> The proposal is not to continue to support the config. >>>> > >>>>> > >> I feel like people are not quite understanding the change, and >>>> > >>>>> objecting >>>> > >>>>> > >> to something that doesn't exist. >>>> > >>>>> > >> It's a shame, as this seems like something not even worth >>>> > >>>>> discussing. I >>>> > >>>>> > >> don't know why this triggered this much discussion. We have >>>> > >>>>> > >> kept >>>> > >>>>> deprecated >>>> > >>>>> > >> methods without blinking, which is in comparison much bigger. >>>> > >>>>> > >> Can we maybe ask you review the actual change in question? >>>> > >>>>> > >> >>>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <yangji...@apache.org> >>>> > >>>>> wrote: >>>> > >>>>> > >> >>>> > >>>>> > >>> -1 >>>> > >>>>> > >>> Remove migration logic of incorrect `spark.databricks.*` >>>> > >>>>> configuration >>>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration was >>>> > >>>>> > >>> initially >>>> > >>>>> introduced >>>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear design intent. >>>> > >>>>> Although the >>>> > >>>>> > >>> immediate maintenance cost of retaining this configuration >>>> > >>>>> currently seems >>>> > >>>>> > >>> limited, as subsequent versions iterate and user habits >>>> > >>>>> > >>> form, it >>>> > >>>>> may lead >>>> > >>>>> > >>> to the continuous accumulation of technical debt. When users >>>> > >>>>> come to view >>>> > >>>>> > >>> this configuration as one that can be relied on long-term, >>>> > >>>>> future removal >>>> > >>>>> > >>> may face greater resistance from users and could potentially >>>> > >>>>> become an >>>> > >>>>> > >>> entrenched and redundant configuration in the codebase. >>>> > >>>>> Therefore, promptly >>>> > >>>>> > >>> correcting this historically accidental configuration not >>>> > >>>>> > >>> only >>>> > >>>>> maintains >>>> > >>>>> > >>> the normativity of the Spark configuration system but also >>>> > >>>>> prevents >>>> > >>>>> > >>> unintended configurations from becoming de facto standards, >>>> > >>>>> thereby >>>> > >>>>> > >>> reducing long-term maintenance risks. >>>> > >>>>> > >>> >>>> > >>>>> > >>> Jie Yang >>>> > >>>>> > >>> >>>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote: >>>> > >>>>> > >>> > -1 because there exists a feasible migration path for >>>> > >>>>> > >>> > Apache >>>> > >>>>> Spark >>>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5. >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > It's obvious that this Databricks' mistake already causes a >>>> > >>>>> huge >>>> > >>>>> > >>> communication cost in the Apache Spark community and is >>>> > >>>>> suggesting a burden >>>> > >>>>> > >>> to enforce us to handle at least two more PRs at 4.0.0 and >>>> > >>>>> > >>> 4.1.0. >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > Given that, I don't think >>>> > >>>>> > >>> > - This is an inevitable or >>>> > >>>>> > >>> > - This is 0 cost >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > Dongjoon. >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote: >>>> > >>>>> > >>> > > Starting from my +1 (non-binding). >>>> > >>>>> > >>> > > >>>> > >>>>> > >>> > > In addition, I propose to retain migration logic till >>>> > >>>>> > >>> > > Spark >>>> > >>>>> 4.1.x and >>>> > >>>>> > >>> > > remove it in Spark 4.2.0. >>>> > >>>>> > >>> > > >>>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim < >>>> > >>>>> > >>> kabhwan.opensou...@gmail.com> >>>> > >>>>> > >>> > > wrote: >>>> > >>>>> > >>> > > >>>> > >>>>> > >>> > > > Hi dev, >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > Please vote to retain migration logic of incorrect >>>> > >>>>> > >>> `spark.databricks.*` >>>> > >>>>> > >>> > > > configuration in Spark 4.0.x. >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > - DISCUSSION: >>>> > >>>>> > >>> > > > >>>> > >>>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr >>>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* config being >>>> > >>>>> exposed in >>>> > >>>>> > >>> 3.5.4 in >>>> > >>>>> > >>> > > > Spark 4.0.0+) >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > Specifically, please review this post >>>> > >>>>> > >>> > > > >>>> > >>>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k >>>> > >>>>> > >>> which >>>> > >>>>> > >>> > > > explains pros and cons about the proposal - proposal is >>>> > >>>>> about >>>> > >>>>> > >>> "Option 1". >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > Simply speaking, this vote is to allow streaming >>>> > >>>>> > >>> > > > queries >>>> > >>>>> which had >>>> > >>>>> > >>> been >>>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with Spark >>>> > >>>>> > >>> > > > 4.0.x, >>>> > >>>>> "without >>>> > >>>>> > >>> having to >>>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If the vote >>>> > >>>>> passes, we >>>> > >>>>> > >>> will help >>>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark 3.5.4 to >>>> > >>>>> > >>> > > > Spark >>>> > >>>>> 4.0.x, >>>> > >>>>> > >>> which would >>>> > >>>>> > >>> > > > be almost 1 year. >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > The (only) cons in this option is having to retain the >>>> > >>>>> incorrect >>>> > >>>>> > >>> > > > configuration name as "string" in the codebase a bit >>>> > >>>>> longer. The >>>> > >>>>> > >>> code >>>> > >>>>> > >>> > > > complexity of migration logic is arguably trivial. >>>> > >>>>> > >>> > > > (link >>>> > >>>>> > >>> > > > < >>>> > >>>>> > >>> >>>> > >>>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183 >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > > > ) >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone supports >>>> > >>>>> including >>>> > >>>>> > >>> migration >>>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please cast +1 >>>> > >>>>> > >>> > > > here >>>> > >>>>> and leave >>>> > >>>>> > >>> the >>>> > >>>>> > >>> > > > desired last minor version of Spark to retain this >>>> > >>>>> migration logic. >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > The vote is open for the next 72 hours and passes if a >>>> > >>>>> majority +1 >>>> > >>>>> > >>> PMC >>>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes. >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect >>>> > >>>>> `spark.databricks.*` >>>> > >>>>> > >>> > > > configuration in Spark 4.0.x >>>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect >>>> > >>>>> `spark.databricks.*` >>>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because... >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > > Thanks! >>>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR) >>>> > >>>>> > >>> > > > >>>> > >>>>> > >>> > > >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > >>>> > >>>>> --------------------------------------------------------------------- >>>> > >>>>> > >>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> > >>>>> > >>> > >>>> > >>>>> > >>> > >>>> > >>>>> > >>> >>>> > >>>>> > >>> >>>> > >>>>> --------------------------------------------------------------------- >>>> > >>>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> > >>>>> > >>> >>>> > >>>>> > >>> >>>> > >>>>> > >>>> > >>>>> >>>> > >>>>> --------------------------------------------------------------------- >>>> > >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> > >>>>> >>>> > >>>>> >>>> > >>>> >>>> > >>>> -- >>>> > >>>> Adam Binford >>>> > >>>> >>>> > >>> >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org