forgot to mention in my last reply, my stance is +1 Jie Yang
On 2025/03/13 07:08:12 Russell Jurney wrote: > Sure, +1 non-binding. > > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > > > Russell, > > > > Of course, we hear people' voices who aren't having binding votes as well. > > Personally I think it's more important than committers/PMC members' VOTE > > this time since we can be biased and be far from user experience. > > > > Could you please explicitly cast your vote, like +1 (non-binding)? You > > seem to agree with the proposal. Thanks! > > > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney <russell.jur...@gmail.com> > > wrote: > > > >> I'm just a lurker and aspiring contributor, but as a Spark user upgrading > >> twice is very confusing and would cause many or most users to fail to > >> upgrade successfully to Spark 4 on a first go. That seems like a very bad > >> user experience. I thought it was worthwhile stating this out loud. > >> > >> Russell > >> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <gatorsm...@gmail.com> wrote: > >> > >>> this vote is to allow streaming queries which had been ever run in Spark > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to be upgraded > >>>> with > >>>> Spark 3.5.5+ in prior". > >>> > >>> > >>> In the history of Apache Spark, have we ever required users to upgrade > >>> to the next maintenance release before moving to a new feature or major > >>> release? > >>> > >>> Xiao > >>> > >>> Adam Binford <adam...@gmail.com> 于2025年3月11日周二 09:08写道: > >>> > >>>> +1 (non-binding) > >>>> > >>>> It's a pretty in the weeds issue with how Structured Streaming works > >>>> under the hood that's kinda hard to understand if you're not familiar > >>>> with > >>>> it. The migration logic doesn't mean users can still use the old config, > >>>> it's purely behind the scenes to fix checkpoint metadata in streams > >>>> created > >>>> in 3.5.4. The 5 lines of code it takes to address a weird edge case for > >>>> certain users that's already gone from master shouldn't be a huge deal. > >>>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <yangji...@apache.org> wrote: > >>>> > >>>>> > >>>>> To Sean, you're right, I'm very sorry. > >>>>> > >>>>> From the perspective of compatibility and migratability, I think we > >>>>> should migrate this logic to 4.0.0 and keep it in the codebase for a > >>>>> longer > >>>>> time (or permanently), because we can't predict which version users of > >>>>> 3.5.4 will choose next. > >>>>> > >>>>> > >>>>> I don't want to discuss the so-called vendor issue. > >>>>> > >>>>> I withdraw my previous -1. > >>>>> > >>>>> Jie Yang. > >>>>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote: > >>>>> > Guys, let’s be honest about what we’re discussing here. > >>>>> > > >>>>> > If this is a migration issue, why would we even need a vote? We’ve > >>>>> been > >>>>> > consistently adding configurations to restore legacy behavior > >>>>> instead of > >>>>> > removing them because we understand the challenges of upgrading Spark > >>>>> > versions. Our goal has always been to make upgrades easier, even if > >>>>> it > >>>>> > means carrying some technical debt. I don’t think we want to change > >>>>> that > >>>>> > culture now. > >>>>> > > >>>>> > If the concern is about vendor names appearing in the codebase, then > >>>>> why is > >>>>> > it a big deal this time when vendor names are already present > >>>>> elsewhere? If > >>>>> > we’ve failed to follow a policy, let’s correct it, but can someone > >>>>> point to > >>>>> > the specific policy we’re violating? > >>>>> > > >>>>> > If the vote is about adding migration logic to ease the upgrade from > >>>>> 3.5.4 > >>>>> > to 4.0.0, then +1, why not? > >>>>> > > >>>>> > Thanks, > >>>>> > Wenchen > >>>>> > > >>>>> > > >>>>> > > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim < > >>>>> kabhwan.opensou...@gmail.com> > >>>>> > wrote: > >>>>> > > >>>>> > > Well said, Sean. Sorry I made you keep around here since it might > >>>>> not be > >>>>> > > clearly stated. My bad. > >>>>> > > > >>>>> > > Yang, how could we ever tolerate the fact there are "other" > >>>>> occurrences of > >>>>> > > vendor names in the codebase? Please go and search "databricks" in > >>>>> the > >>>>> > > codebase and be surprised. > >>>>> > > > >>>>> > > If we believe that having vendor names in the codebase will > >>>>> increase > >>>>> > > the occurrence of making mistakes, why didn't we have a discussion > >>>>> thread > >>>>> > > earlier to remove all occurrences altogether? This is super tricky > >>>>> because > >>>>> > > I can even start to argue we have "Apple" as a vendor name in > >>>>> Apache Spark > >>>>> > > codebase. I'm not saying we use "apple" in the test data. See > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? No, > >>>>> `isMacOnMSeries` or > >>>>> > > `isMacOnSilicon` is enough. > >>>>> > > > >>>>> > > We really need to draw a line where we disallow vendor names on it > >>>>> - if > >>>>> > > it's the entire codebase, I don't really think it is realistic. > >>>>> > > > >>>>> > > This was really a mistake, and it was definitely not from > >>>>> referring to the > >>>>> > > existing codebase. Not having a vendor name does not change > >>>>> anything on the > >>>>> > > chance of encountering this issue again. If we really care, we > >>>>> should think > >>>>> > > about style checking, which is the only viable way to catch the > >>>>> mistake. > >>>>> > > Again, I'd argue we have to have a bunch of vendor names in that > >>>>> style > >>>>> > > check, not just the problematic vendor name. > >>>>> > > > >>>>> > > > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen <sro...@gmail.com> > >>>>> wrote: > >>>>> > > > >>>>> > >> Doesn't the migration code 'clear' the debt? > >>>>> > >> The proposal is not to continue to support the config. > >>>>> > >> I feel like people are not quite understanding the change, and > >>>>> objecting > >>>>> > >> to something that doesn't exist. > >>>>> > >> It's a shame, as this seems like something not even worth > >>>>> discussing. I > >>>>> > >> don't know why this triggered this much discussion. We have kept > >>>>> deprecated > >>>>> > >> methods without blinking, which is in comparison much bigger. > >>>>> > >> Can we maybe ask you review the actual change in question? > >>>>> > >> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie <yangji...@apache.org> > >>>>> wrote: > >>>>> > >> > >>>>> > >>> -1 > >>>>> > >>> Remove migration logic of incorrect `spark.databricks.*` > >>>>> configuration > >>>>> > >>> in Spark 4.0.0 because I think this configuration was initially > >>>>> introduced > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear design intent. > >>>>> Although the > >>>>> > >>> immediate maintenance cost of retaining this configuration > >>>>> currently seems > >>>>> > >>> limited, as subsequent versions iterate and user habits form, it > >>>>> may lead > >>>>> > >>> to the continuous accumulation of technical debt. When users > >>>>> come to view > >>>>> > >>> this configuration as one that can be relied on long-term, > >>>>> future removal > >>>>> > >>> may face greater resistance from users and could potentially > >>>>> become an > >>>>> > >>> entrenched and redundant configuration in the codebase. > >>>>> Therefore, promptly > >>>>> > >>> correcting this historically accidental configuration not only > >>>>> maintains > >>>>> > >>> the normativity of the Spark configuration system but also > >>>>> prevents > >>>>> > >>> unintended configurations from becoming de facto standards, > >>>>> thereby > >>>>> > >>> reducing long-term maintenance risks. > >>>>> > >>> > >>>>> > >>> Jie Yang > >>>>> > >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote: > >>>>> > >>> > -1 because there exists a feasible migration path for Apache > >>>>> Spark > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5. > >>>>> > >>> > > >>>>> > >>> > It's obvious that this Databricks' mistake already causes a > >>>>> huge > >>>>> > >>> communication cost in the Apache Spark community and is > >>>>> suggesting a burden > >>>>> > >>> to enforce us to handle at least two more PRs at 4.0.0 and 4.1.0. > >>>>> > >>> > > >>>>> > >>> > Given that, I don't think > >>>>> > >>> > - This is an inevitable or > >>>>> > >>> > - This is 0 cost > >>>>> > >>> > > >>>>> > >>> > Dongjoon. > >>>>> > >>> > > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote: > >>>>> > >>> > > Starting from my +1 (non-binding). > >>>>> > >>> > > > >>>>> > >>> > > In addition, I propose to retain migration logic till Spark > >>>>> 4.1.x and > >>>>> > >>> > > remove it in Spark 4.2.0. > >>>>> > >>> > > > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim < > >>>>> > >>> kabhwan.opensou...@gmail.com> > >>>>> > >>> > > wrote: > >>>>> > >>> > > > >>>>> > >>> > > > Hi dev, > >>>>> > >>> > > > > >>>>> > >>> > > > Please vote to retain migration logic of incorrect > >>>>> > >>> `spark.databricks.*` > >>>>> > >>> > > > configuration in Spark 4.0.x. > >>>>> > >>> > > > > >>>>> > >>> > > > - DISCUSSION: > >>>>> > >>> > > > > >>>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* config being > >>>>> exposed in > >>>>> > >>> 3.5.4 in > >>>>> > >>> > > > Spark 4.0.0+) > >>>>> > >>> > > > > >>>>> > >>> > > > Specifically, please review this post > >>>>> > >>> > > > > >>>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k > >>>>> > >>> which > >>>>> > >>> > > > explains pros and cons about the proposal - proposal is > >>>>> about > >>>>> > >>> "Option 1". > >>>>> > >>> > > > > >>>>> > >>> > > > Simply speaking, this vote is to allow streaming queries > >>>>> which had > >>>>> > >>> been > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, > >>>>> "without > >>>>> > >>> having to > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If the vote > >>>>> passes, we > >>>>> > >>> will help > >>>>> > >>> > > > users to have a smooth upgrade from Spark 3.5.4 to Spark > >>>>> 4.0.x, > >>>>> > >>> which would > >>>>> > >>> > > > be almost 1 year. > >>>>> > >>> > > > > >>>>> > >>> > > > The (only) cons in this option is having to retain the > >>>>> incorrect > >>>>> > >>> > > > configuration name as "string" in the codebase a bit > >>>>> longer. The > >>>>> > >>> code > >>>>> > >>> > > > complexity of migration logic is arguably trivial. (link > >>>>> > >>> > > > < > >>>>> > >>> > >>>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183 > >>>>> > >>> > > >>>>> > >>> > > > ) > >>>>> > >>> > > > > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone supports > >>>>> including > >>>>> > >>> migration > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please cast +1 here > >>>>> and leave > >>>>> > >>> the > >>>>> > >>> > > > desired last minor version of Spark to retain this > >>>>> migration logic. > >>>>> > >>> > > > > >>>>> > >>> > > > The vote is open for the next 72 hours and passes if a > >>>>> majority +1 > >>>>> > >>> PMC > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes. > >>>>> > >>> > > > > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect > >>>>> `spark.databricks.*` > >>>>> > >>> > > > configuration in Spark 4.0.x > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect > >>>>> `spark.databricks.*` > >>>>> > >>> > > > configuration in Spark 4.0.0 because... > >>>>> > >>> > > > > >>>>> > >>> > > > Thanks! > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR) > >>>>> > >>> > > > > >>>>> > >>> > > > >>>>> > >>> > > >>>>> > >>> > > >>>>> --------------------------------------------------------------------- > >>>>> > >>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>> > >>> > > >>>>> > >>> > > >>>>> > >>> > >>>>> > >>> > >>>>> --------------------------------------------------------------------- > >>>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>> > >>> > >>>>> > >>> > >>>>> > > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>> > >>>>> > >>>> > >>>> -- > >>>> Adam Binford > >>>> > >>> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org