Russell, Of course, we hear people' voices who aren't having binding votes as well. Personally I think it's more important than committers/PMC members' VOTE this time since we can be biased and be far from user experience.
Could you please explicitly cast your vote, like +1 (non-binding)? You seem to agree with the proposal. Thanks! On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney <russell.jur...@gmail.com> wrote: > I'm just a lurker and aspiring contributor, but as a Spark user upgrading > twice is very confusing and would cause many or most users to fail to > upgrade successfully to Spark 4 on a first go. That seems like a very bad > user experience. I thought it was worthwhile stating this out loud. > > Russell > > On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <gatorsm...@gmail.com> wrote: > >> this vote is to allow streaming queries which had been ever run in Spark >>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to be upgraded with >>> Spark 3.5.5+ in prior". >> >> >> In the history of Apache Spark, have we ever required users to upgrade to >> the next maintenance release before moving to a new feature or major >> release? >> >> Xiao >> >> Adam Binford <adam...@gmail.com> 于2025年3月11日周二 09:08写道: >> >>> +1 (non-binding) >>> >>> It's a pretty in the weeds issue with how Structured Streaming works >>> under the hood that's kinda hard to understand if you're not familiar with >>> it. The migration logic doesn't mean users can still use the old config, >>> it's purely behind the scenes to fix checkpoint metadata in streams created >>> in 3.5.4. The 5 lines of code it takes to address a weird edge case for >>> certain users that's already gone from master shouldn't be a huge deal. >>> >>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <yangji...@apache.org> wrote: >>> >>>> >>>> To Sean, you're right, I'm very sorry. >>>> >>>> From the perspective of compatibility and migratability, I think we >>>> should migrate this logic to 4.0.0 and keep it in the codebase for a longer >>>> time (or permanently), because we can't predict which version users of >>>> 3.5.4 will choose next. >>>> >>>> >>>> I don't want to discuss the so-called vendor issue. >>>> >>>> I withdraw my previous -1. >>>> >>>> Jie Yang. >>>> >>>> On 2025/03/11 04:42:25 Wenchen Fan wrote: >>>> > Guys, let’s be honest about what we’re discussing here. >>>> > >>>> > If this is a migration issue, why would we even need a vote? We’ve >>>> been >>>> > consistently adding configurations to restore legacy behavior instead >>>> of >>>> > removing them because we understand the challenges of upgrading Spark >>>> > versions. Our goal has always been to make upgrades easier, even if it >>>> > means carrying some technical debt. I don’t think we want to change >>>> that >>>> > culture now. >>>> > >>>> > If the concern is about vendor names appearing in the codebase, then >>>> why is >>>> > it a big deal this time when vendor names are already present >>>> elsewhere? If >>>> > we’ve failed to follow a policy, let’s correct it, but can someone >>>> point to >>>> > the specific policy we’re violating? >>>> > >>>> > If the vote is about adding migration logic to ease the upgrade from >>>> 3.5.4 >>>> > to 4.0.0, then +1, why not? >>>> > >>>> > Thanks, >>>> > Wenchen >>>> > >>>> > >>>> > >>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim < >>>> kabhwan.opensou...@gmail.com> >>>> > wrote: >>>> > >>>> > > Well said, Sean. Sorry I made you keep around here since it might >>>> not be >>>> > > clearly stated. My bad. >>>> > > >>>> > > Yang, how could we ever tolerate the fact there are "other" >>>> occurrences of >>>> > > vendor names in the codebase? Please go and search "databricks" in >>>> the >>>> > > codebase and be surprised. >>>> > > >>>> > > If we believe that having vendor names in the codebase will increase >>>> > > the occurrence of making mistakes, why didn't we have a discussion >>>> thread >>>> > > earlier to remove all occurrences altogether? This is super tricky >>>> because >>>> > > I can even start to argue we have "Apple" as a vendor name in >>>> Apache Spark >>>> > > codebase. I'm not saying we use "apple" in the test data. See >>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? No, >>>> `isMacOnMSeries` or >>>> > > `isMacOnSilicon` is enough. >>>> > > >>>> > > We really need to draw a line where we disallow vendor names on it >>>> - if >>>> > > it's the entire codebase, I don't really think it is realistic. >>>> > > >>>> > > This was really a mistake, and it was definitely not from referring >>>> to the >>>> > > existing codebase. Not having a vendor name does not change >>>> anything on the >>>> > > chance of encountering this issue again. If we really care, we >>>> should think >>>> > > about style checking, which is the only viable way to catch the >>>> mistake. >>>> > > Again, I'd argue we have to have a bunch of vendor names in that >>>> style >>>> > > check, not just the problematic vendor name. >>>> > > >>>> > > >>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen <sro...@gmail.com> >>>> wrote: >>>> > > >>>> > >> Doesn't the migration code 'clear' the debt? >>>> > >> The proposal is not to continue to support the config. >>>> > >> I feel like people are not quite understanding the change, and >>>> objecting >>>> > >> to something that doesn't exist. >>>> > >> It's a shame, as this seems like something not even worth >>>> discussing. I >>>> > >> don't know why this triggered this much discussion. We have kept >>>> deprecated >>>> > >> methods without blinking, which is in comparison much bigger. >>>> > >> Can we maybe ask you review the actual change in question? >>>> > >> >>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <yangji...@apache.org> >>>> wrote: >>>> > >> >>>> > >>> -1 >>>> > >>> Remove migration logic of incorrect `spark.databricks.*` >>>> configuration >>>> > >>> in Spark 4.0.0 because I think this configuration was initially >>>> introduced >>>> > >>> accidentally in Spark 3.5.4, lacking a clear design intent. >>>> Although the >>>> > >>> immediate maintenance cost of retaining this configuration >>>> currently seems >>>> > >>> limited, as subsequent versions iterate and user habits form, it >>>> may lead >>>> > >>> to the continuous accumulation of technical debt. When users come >>>> to view >>>> > >>> this configuration as one that can be relied on long-term, future >>>> removal >>>> > >>> may face greater resistance from users and could potentially >>>> become an >>>> > >>> entrenched and redundant configuration in the codebase. >>>> Therefore, promptly >>>> > >>> correcting this historically accidental configuration not only >>>> maintains >>>> > >>> the normativity of the Spark configuration system but also >>>> prevents >>>> > >>> unintended configurations from becoming de facto standards, >>>> thereby >>>> > >>> reducing long-term maintenance risks. >>>> > >>> >>>> > >>> Jie Yang >>>> > >>> >>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote: >>>> > >>> > -1 because there exists a feasible migration path for Apache >>>> Spark >>>> > >>> 3.5.4 via Apache Spark 3.5.5. >>>> > >>> > >>>> > >>> > It's obvious that this Databricks' mistake already causes a huge >>>> > >>> communication cost in the Apache Spark community and is >>>> suggesting a burden >>>> > >>> to enforce us to handle at least two more PRs at 4.0.0 and 4.1.0. >>>> > >>> > >>>> > >>> > Given that, I don't think >>>> > >>> > - This is an inevitable or >>>> > >>> > - This is 0 cost >>>> > >>> > >>>> > >>> > Dongjoon. >>>> > >>> > >>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote: >>>> > >>> > > Starting from my +1 (non-binding). >>>> > >>> > > >>>> > >>> > > In addition, I propose to retain migration logic till Spark >>>> 4.1.x and >>>> > >>> > > remove it in Spark 4.2.0. >>>> > >>> > > >>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim < >>>> > >>> kabhwan.opensou...@gmail.com> >>>> > >>> > > wrote: >>>> > >>> > > >>>> > >>> > > > Hi dev, >>>> > >>> > > > >>>> > >>> > > > Please vote to retain migration logic of incorrect >>>> > >>> `spark.databricks.*` >>>> > >>> > > > configuration in Spark 4.0.x. >>>> > >>> > > > >>>> > >>> > > > - DISCUSSION: >>>> > >>> > > > >>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr >>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* config being exposed >>>> in >>>> > >>> 3.5.4 in >>>> > >>> > > > Spark 4.0.0+) >>>> > >>> > > > >>>> > >>> > > > Specifically, please review this post >>>> > >>> > > > >>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k >>>> > >>> which >>>> > >>> > > > explains pros and cons about the proposal - proposal is >>>> about >>>> > >>> "Option 1". >>>> > >>> > > > >>>> > >>> > > > Simply speaking, this vote is to allow streaming queries >>>> which had >>>> > >>> been >>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, >>>> "without >>>> > >>> having to >>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If the vote >>>> passes, we >>>> > >>> will help >>>> > >>> > > > users to have a smooth upgrade from Spark 3.5.4 to Spark >>>> 4.0.x, >>>> > >>> which would >>>> > >>> > > > be almost 1 year. >>>> > >>> > > > >>>> > >>> > > > The (only) cons in this option is having to retain the >>>> incorrect >>>> > >>> > > > configuration name as "string" in the codebase a bit >>>> longer. The >>>> > >>> code >>>> > >>> > > > complexity of migration logic is arguably trivial. (link >>>> > >>> > > > < >>>> > >>> >>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183 >>>> > >>> > >>>> > >>> > > > ) >>>> > >>> > > > >>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone supports >>>> including >>>> > >>> migration >>>> > >>> > > > logic to be longer than Spark 4.0.x, please cast +1 here >>>> and leave >>>> > >>> the >>>> > >>> > > > desired last minor version of Spark to retain this >>>> migration logic. >>>> > >>> > > > >>>> > >>> > > > The vote is open for the next 72 hours and passes if a >>>> majority +1 >>>> > >>> PMC >>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes. >>>> > >>> > > > >>>> > >>> > > > [ ] +1 Retain migration logic of incorrect >>>> `spark.databricks.*` >>>> > >>> > > > configuration in Spark 4.0.x >>>> > >>> > > > [ ] -1 Remove migration logic of incorrect >>>> `spark.databricks.*` >>>> > >>> > > > configuration in Spark 4.0.0 because... >>>> > >>> > > > >>>> > >>> > > > Thanks! >>>> > >>> > > > Jungtaek Lim (HeartSaVioR) >>>> > >>> > > > >>>> > >>> > > >>>> > >>> > >>>> > >>> > >>>> --------------------------------------------------------------------- >>>> > >>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> > >>> > >>>> > >>> > >>>> > >>> >>>> > >>> >>>> --------------------------------------------------------------------- >>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> > >>> >>>> > >>> >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >>> >>> -- >>> Adam Binford >>> >>