small missing on link: 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR, and hadn't got any feedback except being told to wait for Spark 3.5.5 (link <https://github.com/apache/spark/pull/49983#pullrequestreview-2621947671>). If you weren't open to my proposal, you should have just said "we were already decided" and you had to give the evidence. I haven't heard any, so I had to initiate DISCUSS.
On Sat, Mar 15, 2025 at 11:18 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > > according to the ASF process, the Apache Spark community made the > conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code > with the improved Spark 4.0 migration guide because I provided a technical > justification for my vote via the concrete alternative based on the > existing Spark 3.5.5, AS-IS code base, and the suggested better migration > guide way in order to eliminate the affected streaming queries. > > I can always be corrected if you give the evidence. Let's stop "just" > talking. I believe we are seeing quite different things and our memory is > quite opposite. "History will tell us." > > I am trying to understand where the miscommunication came from. Some > clarification: > > 1. I believe I have said I do not agree just removing the config in > master/4.0 and I expected follow-up, which is the migration logic. (link > <https://github.com/apache/spark/pull/49897#issuecomment-2652486115>) I > admit this is a bit unclear to understand, but I had multiple times to make > my voice clear, otherwise I shouldn't ever have migration logic PR for > master/4.0. > 2. I believe I have said my intention is to land the migration logic to > 4.0.x and arguably longer (link > <https://lists.apache.org/thread/q24vonqhvqh11ghd488rctsm89zvmpqd>). > I think there were people who were wanting to remove the vendor name in > any way, but arguably it just ended with an open question, never to be > reached consensus. People expressed concerns, but nothing was concluded > except we agree with proceeding for Spark 3.5.5. We never made a consensus > on how to deal with it in Spark 4.0.0+ in that discussion thread, > especially about migration logic. > 3. VOTE for removal of config is clearly stated that it is only 3.5. (link > <https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5>) > 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR, > and hadn't got any feedback except being told to wait for Spark 3.5.5 > (link). If you weren't open to my proposal, you should have just said "we > were already decided" and you had to give the evidence. I haven't heard > any, so I had to initiate DISCUSS. > 5. We all know about DISCUSS and VOTE so I wouldn't repeat. > > I have strong evidence that you were aware of the fact we never agreed > with the behavior for Spark 4.0.0, and you said my proposal is "technically > correct", so we had never debated about "technical objection", but debated > about "behavior". > https://github.com/apache/spark/pull/49983#issuecomment-2676531485 > > Can you please explain why you said my proposal is "technically correct" > and here you did a vote which required "technical objection"? Have you > changed your mind? > > Overall, when you say "the Apache Spark community made the conclusion to > unblock the Apache Spark 4.0.0 release with the AS-IS code", I don't get > who is "the Apache Spark community". Where can I see the DISCUSS and VOTE > thread? Is it really that I am excluded on the list of the Apache Spark > community, while arguably I am the only active maintainer of the module? > Could you please enumerate who the Apache Spark community was at that time? > > Let's not talk based on memory. If we agree about that, we should have a > history. I am open to apologize if I missed a critical discussion and vote. > Your (and my) memory should never be used as evidence. Please, give the > evidence. > > I'm also happy to hear about the other thread I have made. Thanks. > > On Sat, Mar 15, 2025 at 9:23 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Apache Spark PMC always strongly recommends all 3.5 users to upgrade to >> the latest stable release via the official website. The main question seems >> quite different from the Apache Spark website. May I ask what is not safe >> to guide Spark 3.5.4 users to 3.5.5, Jungtaek? >> >> > The main question was, "where is the evidence it's safe to force users >> to upgrade to Spark 3.5.5... >> >> For the following part, when the Apache Spark community made a mistake at >> Spark 2.4.2 release, we guided the users to upgrade to 2.4.3 immediately >> after recovering the default Scala version to 2.11. >> >> > to upgrade to Spark 3.5.5 before upgrading to Spark 4.0.0". >> >> 2019-04-23 https://spark.apache.org/releases/spark-release-2-4-2.html >> 2019-05-08 https://spark.apache.org/releases/spark-release-2-4-3.html >> >> In the same way, Apache Spark 3.5.5 was released and is ready to handle a >> mistake at Spark 3.5.4. >> >> 2025-02-27 https://spark.apache.org/releases/spark-release-3-5-5.html >> >> For the vote, the vote is a time-limited procedure to make a swift >> decision. That's the reason why you proposed the vote procedure and we >> agreed. There is no way to `block` the votes. The vote itself is already >> completed (including my -1). >> >> > you weren’t intended to “block” the vote >> >> I've been considering this as a part of the whole `spark.databricks.*` >> incident handling. In my interpretation, according to the ASF process, the >> Apache Spark community made the conclusion to unblock the Apache Spark >> 4.0.0 release with the AS-IS code with the improved Spark 4.0 migration >> guide because I provided a technical justification for my vote via the >> concrete alternative based on the existing Spark 3.5.5, AS-IS code base, >> and the suggested better migration guide way in order to eliminate the >> affected streaming queries. >> >> Thanks, >> Dongjoon. >> >> >> >> On Fri, Mar 14, 2025 at 3:23 AM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> That said, if I understand correctly, you weren’t intended to “block” >>> the vote, right? You say you expected the vote to be finished. >>> >>> Could you please cast the vote to -0.x since some people views this as >>> code change vote, or clarify explicitly that you think this is not a code >>> change vote? This will help resolve the concerns from some PMC members >>> about how we should interpret the vote result clearly. >>> >>> Thanks! >>> >>> 2025년 3월 14일 (금) 오후 5:33, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성: >>> >>>> Thank you all. >>>> >>>> The vote is finished in an intended way with the expected result. We >>>> have enough time to discuss and I have been sticking to my original >>>> technical justification from the beginning (including this). >>>> >>>> 1. Helping renaming the conf via SPARK-51172 (by approving it) >>>> 2. Banning `spark.databricks.*` via SPARK-51173 (by adding `configName` >>>> Scalastyle rule) >>>> 3. Led the discussion thread and reached the agreement to release Spark >>>> 3.5.5 early. >>>> 4. Releasing 3.5.5 as a release manager to provide a candidate >>>> migration path >>>> 5. Proposing to use the migration path >>>> >>>> This vote was Step 5. My technical point has always been aiming to >>>> recover the Apache Spark 4 codebase to the status before our mistake by >>>> containing the issue only in `branch-3.5` and providing the proposed narrow >>>> migration path. And, as mentioned already, that's the situation where we >>>> were during the vote at Apache Spark AS-IS branches. What all of us agree >>>> on is that the previous code base is okay. I didn't reply to >>>> Jungtaek's Apple comment intentionally because it's not a public >>>> Spark-vendor like Databricks. And, it's a product name of the popular >>>> consumer electronic devices like Intel/AMD/Graviton. In addition, I don't >>>> think we are going to add back `spark.databricks.*` because of the reason >>>> the customers ask for it. In the same way, this vote is one of the >>>> political decision making processes of Apache Spark PMC. We started this >>>> vote because we couldn't make a consensus. >>>> >>>> I believe I've been providing all my best to the Apache Spark community >>>> by actions and with valid technical clarification (without no modification >>>> during the process). >>>> >>>> Sincerely, >>>> Dongjoon >>>> >>>> >>>> On Thu, Mar 13, 2025 at 11:41 PM Mridul Muralidharan <mri...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> FWIW, I am +1 on the proposal (though I missed the vote on this !) >>>>> >>>>> Regards, >>>>> Mridul >>>>> >>>>> On Fri, Mar 14, 2025 at 1:31 AM Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> I agree with Mark, imo this is a qualified veto. >>>>>> We should give Dongjoon the opportunity to give his clarification, if >>>>>> any. >>>>>> >>>>>> I do realize this delays the RC process, but this deserves to be >>>>>> looked into carefully. >>>>>> >>>>>> Thanks, >>>>>> Mridul >>>>>> >>>>>> >>>>>> On Thu, Mar 13, 2025 at 9:35 PM Mark Hamstra <markhams...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Absolutely not! >>>>>>> >>>>>>> This is clearly a vote on a code change, not on a procedural issue or >>>>>>> a package release. The code change has been vetoed by a -1 vote by a >>>>>>> qualified voter. >>>>>>> >>>>>>> On Thu, Mar 13, 2025 at 6:58 PM Jungtaek Lim >>>>>>> <kabhwan.opensou...@gmail.com> wrote: >>>>>>> > >>>>>>> > Likewise I said, I'm concluding the VOTE since we ensure the >>>>>>> criteria (3 +1 binding, 1 -1 binding, and also +1s from non-binding). >>>>>>> > >>>>>>> > I don't consider -1 as a veto as I explained, as we should have >>>>>>> multiple -1s if we go for VOTE with the current codebase. (+1 in this >>>>>>> proposal is effectively -1 in another proposal.) >>>>>>> > >>>>>>> > The vote followed the Apache Voting Process with the type of >>>>>>> "package release" (which we tend to use in dev@ for VOTE). I guess >>>>>>> it could have also done with "procedural issues" which is less strict, >>>>>>> but >>>>>>> then this fulfills both types of votes which should be OK. >>>>>>> > >>>>>>> > The current codebase is "accidentally" representing another >>>>>>> proposal and it is never intended. I don't find the way I can -1 to the >>>>>>> current codebase, and make a different change neither bound to any >>>>>>> proposal >>>>>>> to be fair. >>>>>>> > >>>>>>> > I don't want to block the release because of the above. So, let's >>>>>>> change the current codebase the way we discussed and voted here. >>>>>>> Reverting >>>>>>> this decision should require another VOTE. >>>>>>> > >>>>>>> > Thanks to everyone who voted! >>>>>>> > >>>>>>> > On Thu, Mar 13, 2025 at 4:54 PM Jungtaek Lim < >>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>> >> >>>>>>> >> Thanks to everyone who participated and voted! >>>>>>> >> >>>>>>> >> Now I can technically conclude the VOTE, but I'm willing to wait >>>>>>> till US daytime tomorrow, to give some time for Dongjoon to revisit >>>>>>> this. >>>>>>> >> >>>>>>> >> I'll conclude the vote around 6PM PST tomorrow regardless of his >>>>>>> vote. It's ideal to see us have no -1, but having one -1 doesn't block >>>>>>> this >>>>>>> vote and we can move forward. >>>>>>> >> >>>>>>> >> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <yangji...@apache.org> >>>>>>> wrote: >>>>>>> >>> >>>>>>> >>> forgot to mention in my last reply, my stance is +1 >>>>>>> >>> >>>>>>> >>> Jie Yang >>>>>>> >>> >>>>>>> >>> On 2025/03/13 07:08:12 Russell Jurney wrote: >>>>>>> >>> > Sure, +1 non-binding. >>>>>>> >>> > >>>>>>> >>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim < >>>>>>> kabhwan.opensou...@gmail.com> >>>>>>> >>> > wrote: >>>>>>> >>> > >>>>>>> >>> > > Russell, >>>>>>> >>> > > >>>>>>> >>> > > Of course, we hear people' voices who aren't having binding >>>>>>> votes as well. >>>>>>> >>> > > Personally I think it's more important than committers/PMC >>>>>>> members' VOTE >>>>>>> >>> > > this time since we can be biased and be far from user >>>>>>> experience. >>>>>>> >>> > > >>>>>>> >>> > > Could you please explicitly cast your vote, like +1 >>>>>>> (non-binding)? You >>>>>>> >>> > > seem to agree with the proposal. Thanks! >>>>>>> >>> > > >>>>>>> >>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney < >>>>>>> russell.jur...@gmail.com> >>>>>>> >>> > > wrote: >>>>>>> >>> > > >>>>>>> >>> > >> I'm just a lurker and aspiring contributor, but as a Spark >>>>>>> user upgrading >>>>>>> >>> > >> twice is very confusing and would cause many or most users >>>>>>> to fail to >>>>>>> >>> > >> upgrade successfully to Spark 4 on a first go. That seems >>>>>>> like a very bad >>>>>>> >>> > >> user experience. I thought it was worthwhile stating this >>>>>>> out loud. >>>>>>> >>> > >> >>>>>>> >>> > >> Russell >>>>>>> >>> > >> >>>>>>> >>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li < >>>>>>> gatorsm...@gmail.com> wrote: >>>>>>> >>> > >> >>>>>>> >>> > >>> this vote is to allow streaming queries which had been >>>>>>> ever run in Spark >>>>>>> >>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to >>>>>>> be upgraded with >>>>>>> >>> > >>>> Spark 3.5.5+ in prior". >>>>>>> >>> > >>> >>>>>>> >>> > >>> >>>>>>> >>> > >>> In the history of Apache Spark, have we ever required >>>>>>> users to upgrade >>>>>>> >>> > >>> to the next maintenance release before moving to a new >>>>>>> feature or major >>>>>>> >>> > >>> release? >>>>>>> >>> > >>> >>>>>>> >>> > >>> Xiao >>>>>>> >>> > >>> >>>>>>> >>> > >>> Adam Binford <adam...@gmail.com> 于2025年3月11日周二 09:08写道: >>>>>>> >>> > >>> >>>>>>> >>> > >>>> +1 (non-binding) >>>>>>> >>> > >>>> >>>>>>> >>> > >>>> It's a pretty in the weeds issue with how Structured >>>>>>> Streaming works >>>>>>> >>> > >>>> under the hood that's kinda hard to understand if you're >>>>>>> not familiar with >>>>>>> >>> > >>>> it. The migration logic doesn't mean users can still use >>>>>>> the old config, >>>>>>> >>> > >>>> it's purely behind the scenes to fix checkpoint metadata >>>>>>> in streams created >>>>>>> >>> > >>>> in 3.5.4. The 5 lines of code it takes to address a weird >>>>>>> edge case for >>>>>>> >>> > >>>> certain users that's already gone from master shouldn't >>>>>>> be a huge deal. >>>>>>> >>> > >>>> >>>>>>> >>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie < >>>>>>> yangji...@apache.org> wrote: >>>>>>> >>> > >>>> >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> To Sean, you're right, I'm very sorry. >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> From the perspective of compatibility and migratability, >>>>>>> I think we >>>>>>> >>> > >>>>> should migrate this logic to 4.0.0 and keep it in the >>>>>>> codebase for a longer >>>>>>> >>> > >>>>> time (or permanently), because we can't predict which >>>>>>> version users of >>>>>>> >>> > >>>>> 3.5.4 will choose next. >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> I don't want to discuss the so-called vendor issue. >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> I withdraw my previous -1. >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> Jie Yang. >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote: >>>>>>> >>> > >>>>> > Guys, let’s be honest about what we’re discussing here. >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > If this is a migration issue, why would we even need a >>>>>>> vote? We’ve >>>>>>> >>> > >>>>> been >>>>>>> >>> > >>>>> > consistently adding configurations to restore legacy >>>>>>> behavior >>>>>>> >>> > >>>>> instead of >>>>>>> >>> > >>>>> > removing them because we understand the challenges of >>>>>>> upgrading Spark >>>>>>> >>> > >>>>> > versions. Our goal has always been to make upgrades >>>>>>> easier, even if >>>>>>> >>> > >>>>> it >>>>>>> >>> > >>>>> > means carrying some technical debt. I don’t think we >>>>>>> want to change >>>>>>> >>> > >>>>> that >>>>>>> >>> > >>>>> > culture now. >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > If the concern is about vendor names appearing in the >>>>>>> codebase, then >>>>>>> >>> > >>>>> why is >>>>>>> >>> > >>>>> > it a big deal this time when vendor names are already >>>>>>> present >>>>>>> >>> > >>>>> elsewhere? If >>>>>>> >>> > >>>>> > we’ve failed to follow a policy, let’s correct it, but >>>>>>> can someone >>>>>>> >>> > >>>>> point to >>>>>>> >>> > >>>>> > the specific policy we’re violating? >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > If the vote is about adding migration logic to ease >>>>>>> the upgrade from >>>>>>> >>> > >>>>> 3.5.4 >>>>>>> >>> > >>>>> > to 4.0.0, then +1, why not? >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > Thanks, >>>>>>> >>> > >>>>> > Wenchen >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim < >>>>>>> >>> > >>>>> kabhwan.opensou...@gmail.com> >>>>>>> >>> > >>>>> > wrote: >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> > > Well said, Sean. Sorry I made you keep around here >>>>>>> since it might >>>>>>> >>> > >>>>> not be >>>>>>> >>> > >>>>> > > clearly stated. My bad. >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > > Yang, how could we ever tolerate the fact there are >>>>>>> "other" >>>>>>> >>> > >>>>> occurrences of >>>>>>> >>> > >>>>> > > vendor names in the codebase? Please go and search >>>>>>> "databricks" in >>>>>>> >>> > >>>>> the >>>>>>> >>> > >>>>> > > codebase and be surprised. >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > > If we believe that having vendor names in the >>>>>>> codebase will >>>>>>> >>> > >>>>> increase >>>>>>> >>> > >>>>> > > the occurrence of making mistakes, why didn't we >>>>>>> have a discussion >>>>>>> >>> > >>>>> thread >>>>>>> >>> > >>>>> > > earlier to remove all occurrences altogether? This >>>>>>> is super tricky >>>>>>> >>> > >>>>> because >>>>>>> >>> > >>>>> > > I can even start to argue we have "Apple" as a >>>>>>> vendor name in >>>>>>> >>> > >>>>> Apache Spark >>>>>>> >>> > >>>>> > > codebase. I'm not saying we use "apple" in the test >>>>>>> data. See >>>>>>> >>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? >>>>>>> No, >>>>>>> >>> > >>>>> `isMacOnMSeries` or >>>>>>> >>> > >>>>> > > `isMacOnSilicon` is enough. >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > > We really need to draw a line where we disallow >>>>>>> vendor names on it >>>>>>> >>> > >>>>> - if >>>>>>> >>> > >>>>> > > it's the entire codebase, I don't really think it is >>>>>>> realistic. >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > > This was really a mistake, and it was definitely not >>>>>>> from >>>>>>> >>> > >>>>> referring to the >>>>>>> >>> > >>>>> > > existing codebase. Not having a vendor name does not >>>>>>> change >>>>>>> >>> > >>>>> anything on the >>>>>>> >>> > >>>>> > > chance of encountering this issue again. If we >>>>>>> really care, we >>>>>>> >>> > >>>>> should think >>>>>>> >>> > >>>>> > > about style checking, which is the only viable way >>>>>>> to catch the >>>>>>> >>> > >>>>> mistake. >>>>>>> >>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor >>>>>>> names in that >>>>>>> >>> > >>>>> style >>>>>>> >>> > >>>>> > > check, not just the problematic vendor name. >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen < >>>>>>> sro...@gmail.com> >>>>>>> >>> > >>>>> wrote: >>>>>>> >>> > >>>>> > > >>>>>>> >>> > >>>>> > >> Doesn't the migration code 'clear' the debt? >>>>>>> >>> > >>>>> > >> The proposal is not to continue to support the >>>>>>> config. >>>>>>> >>> > >>>>> > >> I feel like people are not quite understanding the >>>>>>> change, and >>>>>>> >>> > >>>>> objecting >>>>>>> >>> > >>>>> > >> to something that doesn't exist. >>>>>>> >>> > >>>>> > >> It's a shame, as this seems like something not even >>>>>>> worth >>>>>>> >>> > >>>>> discussing. I >>>>>>> >>> > >>>>> > >> don't know why this triggered this much discussion. >>>>>>> We have kept >>>>>>> >>> > >>>>> deprecated >>>>>>> >>> > >>>>> > >> methods without blinking, which is in comparison >>>>>>> much bigger. >>>>>>> >>> > >>>>> > >> Can we maybe ask you review the actual change in >>>>>>> question? >>>>>>> >>> > >>>>> > >> >>>>>>> >>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie < >>>>>>> yangji...@apache.org> >>>>>>> >>> > >>>>> wrote: >>>>>>> >>> > >>>>> > >> >>>>>>> >>> > >>>>> > >>> -1 >>>>>>> >>> > >>>>> > >>> Remove migration logic of incorrect >>>>>>> `spark.databricks.*` >>>>>>> >>> > >>>>> configuration >>>>>>> >>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration >>>>>>> was initially >>>>>>> >>> > >>>>> introduced >>>>>>> >>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear >>>>>>> design intent. >>>>>>> >>> > >>>>> Although the >>>>>>> >>> > >>>>> > >>> immediate maintenance cost of retaining this >>>>>>> configuration >>>>>>> >>> > >>>>> currently seems >>>>>>> >>> > >>>>> > >>> limited, as subsequent versions iterate and user >>>>>>> habits form, it >>>>>>> >>> > >>>>> may lead >>>>>>> >>> > >>>>> > >>> to the continuous accumulation of technical debt. >>>>>>> When users >>>>>>> >>> > >>>>> come to view >>>>>>> >>> > >>>>> > >>> this configuration as one that can be relied on >>>>>>> long-term, >>>>>>> >>> > >>>>> future removal >>>>>>> >>> > >>>>> > >>> may face greater resistance from users and could >>>>>>> potentially >>>>>>> >>> > >>>>> become an >>>>>>> >>> > >>>>> > >>> entrenched and redundant configuration in the >>>>>>> codebase. >>>>>>> >>> > >>>>> Therefore, promptly >>>>>>> >>> > >>>>> > >>> correcting this historically accidental >>>>>>> configuration not only >>>>>>> >>> > >>>>> maintains >>>>>>> >>> > >>>>> > >>> the normativity of the Spark configuration system >>>>>>> but also >>>>>>> >>> > >>>>> prevents >>>>>>> >>> > >>>>> > >>> unintended configurations from becoming de facto >>>>>>> standards, >>>>>>> >>> > >>>>> thereby >>>>>>> >>> > >>>>> > >>> reducing long-term maintenance risks. >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> > >>> Jie Yang >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote: >>>>>>> >>> > >>>>> > >>> > -1 because there exists a feasible migration >>>>>>> path for Apache >>>>>>> >>> > >>>>> Spark >>>>>>> >>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5. >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > It's obvious that this Databricks' mistake >>>>>>> already causes a >>>>>>> >>> > >>>>> huge >>>>>>> >>> > >>>>> > >>> communication cost in the Apache Spark community >>>>>>> and is >>>>>>> >>> > >>>>> suggesting a burden >>>>>>> >>> > >>>>> > >>> to enforce us to handle at least two more PRs at >>>>>>> 4.0.0 and 4.1.0. >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > Given that, I don't think >>>>>>> >>> > >>>>> > >>> > - This is an inevitable or >>>>>>> >>> > >>>>> > >>> > - This is 0 cost >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > Dongjoon. >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote: >>>>>>> >>> > >>>>> > >>> > > Starting from my +1 (non-binding). >>>>>>> >>> > >>>>> > >>> > > >>>>>>> >>> > >>>>> > >>> > > In addition, I propose to retain migration >>>>>>> logic till Spark >>>>>>> >>> > >>>>> 4.1.x and >>>>>>> >>> > >>>>> > >>> > > remove it in Spark 4.2.0. >>>>>>> >>> > >>>>> > >>> > > >>>>>>> >>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim < >>>>>>> >>> > >>>>> > >>> kabhwan.opensou...@gmail.com> >>>>>>> >>> > >>>>> > >>> > > wrote: >>>>>>> >>> > >>>>> > >>> > > >>>>>>> >>> > >>>>> > >>> > > > Hi dev, >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > Please vote to retain migration logic of >>>>>>> incorrect >>>>>>> >>> > >>>>> > >>> `spark.databricks.*` >>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x. >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > - DISCUSSION: >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> >>>>>>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr >>>>>>> >>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* >>>>>>> config being >>>>>>> >>> > >>>>> exposed in >>>>>>> >>> > >>>>> > >>> 3.5.4 in >>>>>>> >>> > >>>>> > >>> > > > Spark 4.0.0+) >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > Specifically, please review this post >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> >>>>>>> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k >>>>>>> >>> > >>>>> > >>> which >>>>>>> >>> > >>>>> > >>> > > > explains pros and cons about the proposal - >>>>>>> proposal is >>>>>>> >>> > >>>>> about >>>>>>> >>> > >>>>> > >>> "Option 1". >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > Simply speaking, this vote is to allow >>>>>>> streaming queries >>>>>>> >>> > >>>>> which had >>>>>>> >>> > >>>>> > >>> been >>>>>>> >>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with >>>>>>> Spark 4.0.x, >>>>>>> >>> > >>>>> "without >>>>>>> >>> > >>>>> > >>> having to >>>>>>> >>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If >>>>>>> the vote >>>>>>> >>> > >>>>> passes, we >>>>>>> >>> > >>>>> > >>> will help >>>>>>> >>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark >>>>>>> 3.5.4 to Spark >>>>>>> >>> > >>>>> 4.0.x, >>>>>>> >>> > >>>>> > >>> which would >>>>>>> >>> > >>>>> > >>> > > > be almost 1 year. >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > The (only) cons in this option is having to >>>>>>> retain the >>>>>>> >>> > >>>>> incorrect >>>>>>> >>> > >>>>> > >>> > > > configuration name as "string" in the >>>>>>> codebase a bit >>>>>>> >>> > >>>>> longer. The >>>>>>> >>> > >>>>> > >>> code >>>>>>> >>> > >>>>> > >>> > > > complexity of migration logic is arguably >>>>>>> trivial. (link >>>>>>> >>> > >>>>> > >>> > > > < >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> >>>>>>> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183 >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > > > ) >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone >>>>>>> supports >>>>>>> >>> > >>>>> including >>>>>>> >>> > >>>>> > >>> migration >>>>>>> >>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please >>>>>>> cast +1 here >>>>>>> >>> > >>>>> and leave >>>>>>> >>> > >>>>> > >>> the >>>>>>> >>> > >>>>> > >>> > > > desired last minor version of Spark to >>>>>>> retain this >>>>>>> >>> > >>>>> migration logic. >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > The vote is open for the next 72 hours and >>>>>>> passes if a >>>>>>> >>> > >>>>> majority +1 >>>>>>> >>> > >>>>> > >>> PMC >>>>>>> >>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes. >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect >>>>>>> >>> > >>>>> `spark.databricks.*` >>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x >>>>>>> >>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect >>>>>>> >>> > >>>>> `spark.databricks.*` >>>>>>> >>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because... >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > > Thanks! >>>>>>> >>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR) >>>>>>> >>> > >>>>> > >>> > > > >>>>>>> >>> > >>>>> > >>> > > >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>> > >>>>> > >>> > To unsubscribe e-mail: >>>>>>> dev-unsubscr...@spark.apache.org >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> > >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>> > >>>>> > >>> To unsubscribe e-mail: >>>>>>> dev-unsubscr...@spark.apache.org >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> > >>> >>>>>>> >>> > >>>>> > >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>> > >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>>> >>>>>>> >>> > >>>> >>>>>>> >>> > >>>> -- >>>>>>> >>> > >>>> Adam Binford >>>>>>> >>> > >>>> >>>>>>> >>> > >>> >>>>>>> >>> > >>>>>>> >>> >>>>>>> >>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>>