This is my gist Mark from your passionate language I gather you see this as a "Code Change" veto. Your reasoning seems to be straightforward, i.e. the vote's purpose is to decide whether to add code (migration logic) to the Spark 4.0 branch. In your view, the outcome of the vote directly alters the software's code?
However, If we see it as a *procedural matter only* like some others include myself, it involves a vote and the interpretation of rules. In summary 1. If it is a code change vote, a -1 can be seen as a veto, blocking the change unless specific conditions are met (like the PMC overriding the veto). 2. If it is just a procedural vote, a -1 might simply be a dissenting vote, *not necessarily carrying the power to block the entire action.* FYI, I recall I voted -1 (non binding) on another thread and Dongioon asked me to explain which it was in his right I can see the following vote cast (1) - Jungtaek Lim: +1 (non-binding) - Sean Owen: +1 to retain - Yang Jie: -1, later withdraws it and casts +1 - Adam Binford: +1 (non-binding) - Russell Jurney: +1 non-binding - Yang Jie: +1 - Mridul Muralidharan: +1 - Dongjoon Hyun: -1 *(1) Summary of Voting from 21 emails in the attached file* from https://lists.apache.org/thread/nm3p1zjcybdl0p0mc56t2rl92hb9837n : For Retaining Migration Logic (+1): Jungtaek Lim, Sean Owen, Yang Jie (initially -1, then +1), Adam Binford (non-binding), Russell Jurney (non-binding), Mridul Muralidharan Against Retaining Migration Logic (-1): Dongjoon Hyun Maybe we should put a bar on it and allow Dongjoon to qualify his statement as 1 or 2 above, thern it could be escalated if needed or put at rest HTH On Sat, 15 Mar 2025 at 18:26, Mark Hamstra <markhams...@gmail.com> wrote: > That is utter nonsense, Sean! You do not have any authority to declare > the matter concluded, and I will escalate to the board if you persist > in this approach. > > The proposed code change has been vetoed. As I delineated previously, > there are two and only two ways forward under the ASF Voting Process. > That does not include any individual simply declaring that the matter > has been concluded regardless of the veto and ASF process. > > On Sat, Mar 15, 2025 at 11:18 AM Sean Owen <sro...@gmail.com> wrote: > > > > Mark et al - this thread has gone on way too long. Everyone has > expressed their opinion. The result stands. > > Anyone who is really upset about it, please escalate to the board or > something, but, this thread and decision point has now concluded. > > > > > > On Sat, Mar 15, 2025 at 1:16 PM Mark Hamstra <markhams...@gmail.com> > wrote: > >> > >> You do not have the authority to declare Dongjoon's technical > >> justification invalid. That is up to the PMC: "In case of doubt, > >> deciding whether a technical justification is valid is up to the PMC." > >> > >> On Sat, Mar 15, 2025 at 6:20 AM Jungtaek Lim > >> <kabhwan.opensou...@gmail.com> wrote: > >> > > >> > To summarize, the main arguments of both proposals are "whether we > can force users to upgrade to Spark 3.5.5 first before upgrading Spark > 4.0.0" vs "we should include migration logic to Spark 4.0.0 because that is > not realistic". Where is the "technical objection" here? If you say there > was politics I can clearly say never, but even if you interpret there was > politics, politics is not "technical objection". I can quote the relevant > ASF page for you. > >> > > >> > https://www.apache.org/foundation/voting.html#Veto > >> > > >> > > A -1 vote by a qualified voter stops a code-modification proposal > in its tracks. This constitutes a veto, and it cannot be overruled nor > overridden by anyone. Vetoes stand until and unless the individual > withdraws their veto. > >> > > To prevent vetoes from being used capriciously, the voter must > provide with the veto a technical justification showing why the change is > bad (opens a security exposure, negatively affects performance, etc. ). A > veto without a justification is invalid and has no weight. > >> > > >> > The justification must be "technical one" for vote. I hope ASF just > lists the most cases rather than leaving this as etc, but I think ASF > believes individual's judgement, and I claim there is no "technical > reason". Having to put 4 more lines is never a technical reason. It is > never meant to be used for blocking different opinions. It must be used for > blocking "incidents which impact users", while we are here to do the > opposite, saving users' life. > >> > > >> > On Sat, Mar 15, 2025 at 9:51 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> >> > >> >> > That's the reason why you proposed the vote procedure and we > agreed. > >> >> > >> >> Didn’t you see the part “we agreed”? Who is we in the context? > >> >> > >> >> I don’t think he answered my questions - he explained his reasoning > of his proposal which majorly does not agree with. You even said uou are > not persuaded and I want to ask you now you were persuaded from his last > post. > >> >> > >> >> Again I haven’t heard my answers. He showed his reasoning but there > is nothing about the evidence of the validity of “technical” objection. I > think I have asked people who judged his -1 as veto for their reasoning of > how this could be “technical” objection and I don’t think I heard anything. > >> >> > >> >> I can be corrected if you can point out what is the “technical” > objection. If you or Dongjoon do not provide this to the end of the week, I > have to consider I haven’t heard about that and the veto (although Dongjoon > stated it is not a veto) will be ignored. > >> >> > >> >> 2025년 3월 15일 (토) 오후 8:19, Mark Hamstra <markhams...@gmail.com>님이 작성: > >> >>> > >> >>> Once again, I have to object. Dongjoon said that the vote is a time > >> >>> limited procedure, not that the vote itself is a procedural vote as > >> >>> distinct from a code change vote or a package release vote. > >> >>> > >> >>> Frankly, this feels like you are trying to manipulate the vote > >> >>> procedure by misrepresenting Dongjoon, and you are quickly losing my > >> >>> confidence in your ability to administer a fair voting procedure. > >> >>> > >> >>> I still consider the proposal to be vetoed. > >> >>> > >> >>> > >> >>> On Fri, Mar 14, 2025 at 6:11 PM Jungtaek Lim > >> >>> <kabhwan.opensou...@gmail.com> wrote: > >> >>> > > >> >>> > UPDATE: > >> >>> > > >> >>> > We were having a discussion about the type of VOTE, since > Dongjoon's -1 should be considered as a veto if we see this as a code > change VOTE. > >> >>> > Dongjoon clarified that he does not see this VOTE as a code > change, hence he gave -1 but not intended to block the VOTE. > >> >>> > > >> >>> > That said, we have confirmed that Dongjoon's -1 is not a veto. I > think the VOTE result is correct as it is. I'll proceed with the next steps. > >> >>> > > >> >>> > On Fri, Mar 14, 2025 at 11:19 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> >>> >> > >> >>> >> The vote passes with 7 +1s (3 binding +1s) and 1 -1s (1 binding > -1s). > >> >>> >> Thanks to all who helped with the vote! > >> >>> >> > >> >>> >> I'm going to make a code change in branch-4.0 quickly so that we > don't have to trigger another RC for Spark 4.0.0 just because of this. > >> >>> >> > >> >>> >> (* = binding) > >> >>> >> +1: > >> >>> >> - Sean R. Owen * > >> >>> >> - Jungtaek Lim > >> >>> >> - Nicholas Chammas > >> >>> >> - Wenchen Fan * > >> >>> >> - Adam Binford > >> >>> >> - Russell Jurney > >> >>> >> - Yang Jie * > >> >>> >> > >> >>> >> -1: > >> >>> >> - Dongjoon Hyun * > >> >>> >> > >> >>> >> Thanks, > >> >>> >> Jungtaek Lim (HeartSaVioR) > >> >>> > >> >>> > --------------------------------------------------------------------- > >> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Foal logo Search this list... Available projects (382): Display Settings Logged out dev@spark.apache.org, March 2025:Showing 1 through 15 of 16 topics Found 170 emails by 52 authors, divided into 16 topics. Most active authors: Jungtaek Lim: 52 emails sentWenchen Fan: 9 emails sentMark Hamstra: 9 emails sentMich Talebzadeh: 8 emails sentDongjoon Hyun: 6 emails sent Popular topics: 51318 spark.databricks 3.5.4 retain transformwithstate 4.0.0 nanosecond landing exposed swift Subscribe to list devreviewsissuescommitsuser Yogitha Ramanathan Unsubscribe -- Thanks & Regards Yogitha R 11 112025-03-15 15:11 +00:00Jungtaek Lim [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x Hi dev, Please vote to retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x. - DISCUSSION: https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr ([DISCUSS... 12 372025-03-15 14:26 +00:00 Jungtaek Lim - Monday 10 March 2025 12:44:42 GMT [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x Hi dev, Please vote to retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x. - DISCUSSION: https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr ([DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+) Specifically, please review this post https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k which explains pros and cons about the proposal - proposal is about "Option 1". Simply speaking, this vote is to allow streaming queries which had been ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, "without having to be upgraded with Spark 3.5.5+ in prior". If the vote passes, we will help users to have a smooth upgrade from Spark 3.5.4 to Spark 4.0.x, which would be almost 1 year. The (only) cons in this option is having to retain the incorrect configuration name as "string" in the codebase a bit longer. The code complexity of migration logic is arguably trivial. (link <https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183> ) This VOTE is for Spark 4.0.x, but if someone supports including migration logic to be longer than Spark 4.0.x, please cast +1 here and leave the desired last minor version of Spark to retain this migration logic. The vote is open for the next 72 hours and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x [ ] -1 Remove migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.0 because... Thanks! Jungtaek Lim (HeartSaVioR) Sean Owen - Monday 10 March 2025 12:46:05 GMT +1 to retain, to avoid problems for users at ~0 cost. Jungtaek Lim - Monday 10 March 2025 12:46:16 GMT Starting from my +1 (non-binding). In addition, I propose to retain migration logic till Spark 4.1.x and remove it in Spark 4.2.0. Dongjoon Hyun - Monday 10 March 2025 14:52:52 GMT -1 because there exists a feasible migration path for Apache Spark 3.5.4 via Apache Spark 3.5.5. It's obvious that this Databricks' mistake already causes a huge communication cost in the Apache Spark community and is suggesting a burden to enforce us to handle at least two more PRs at 4.0.0 and 4.1.0. Given that, I don't think - This is an inevitable or - This is 0 cost Dongjoon. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Yang Jie - Tuesday 11 March 2025 03:01:56 GMT -1 Remove migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.0 because I think this configuration was initially introduced accidentally in Spark 3.5.4, lacking a clear design intent. Although the immediate maintenance cost of retaining this configuration currently seems limited, as subsequent versions iterate and user habits form, it may lead to the continuous accumulation of technical debt. When users come to view this configuration as one that can be relied on long-term, future removal may face greater resistance from users and could potentially become an entrenched and redundant configuration in the codebase. Therefore, promptly correcting this historically accidental configuration not only maintains the normativity of the Spark configuration system but also prevents unintended configurations from becoming de facto standards, thereby reducing long-term maintenance risks. Jie Yang --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Sean Owen - Tuesday 11 March 2025 03:16:34 GMT Doesn't the migration code 'clear' the debt? The proposal is not to continue to support the config. I feel like people are not quite understanding the change, and objecting to something that doesn't exist. It's a shame, as this seems like something not even worth discussing. I don't know why this triggered this much discussion. We have kept deprecated methods without blinking, which is in comparison much bigger. Can we maybe ask you review the actual change in question? Jungtaek Lim - Tuesday 11 March 2025 03:48:03 GMT Well said, Sean. Sorry I made you keep around here since it might not be clearly stated. My bad. Yang, how could we ever tolerate the fact there are "other" occurrences of vendor names in the codebase? Please go and search "databricks" in the codebase and be surprised. If we believe that having vendor names in the codebase will increase the occurrence of making mistakes, why didn't we have a discussion thread earlier to remove all occurrences altogether? This is super tricky because I can even start to argue we have "Apple" as a vendor name in Apache Spark codebase. I'm not saying we use "apple" in the test data. See `isMacOnAppleSilicon` in Utils. Is it unavoidable? No, `isMacOnMSeries` or `isMacOnSilicon` is enough. We really need to draw a line where we disallow vendor names on it - if it's the entire codebase, I don't really think it is realistic. This was really a mistake, and it was definitely not from referring to the existing codebase. Not having a vendor name does not change anything on the chance of encountering this issue again. If we really care, we should think about style checking, which is the only viable way to catch the mistake. Again, I'd argue we have to have a bunch of vendor names in that style check, not just the problematic vendor name. Nicholas Chammas - Tuesday 11 March 2025 03:57:15 GMT I agree with Sean that this proposal does not seem to me as controversial as it has turned out so far. Jungtaek’s detailed breakdown on the other thread <https://lists.apache.org/thread/zlhgr1mx0q520odvpnmnzwd8mp9x6bpl> explains that this proposed change is mainly to benefit open source users of Apache Spark and give them a way to directly upgrade from Apache Spark 3.5.4 to 4.0.0, as opposed to forcing them to upgrade first to 3.5.5 before then being able to upgrade to 4.0.0. Jungtaek’s proposal is essentially a convenience to open source users. These users may or may not be using a vendor distribution of Spark. It does not benefit or harm Databricks or any other vendor. And it adds a very small maintenance burden on contributors. Isn’t this a tradeoff we should generally make? Help users upgrade at a minor maintenance cost. +1 Wenchen Fan - Tuesday 11 March 2025 04:42:25 GMT Guys, let’s be honest about what we’re discussing here. If this is a migration issue, why would we even need a vote? We’ve been consistently adding configurations to restore legacy behavior instead of removing them because we understand the challenges of upgrading Spark versions. Our goal has always been to make upgrades easier, even if it means carrying some technical debt. I don’t think we want to change that culture now. If the concern is about vendor names appearing in the codebase, then why is it a big deal this time when vendor names are already present elsewhere? If we’ve failed to follow a policy, let’s correct it, but can someone point to the specific policy we’re violating? If the vote is about adding migration logic to ease the upgrade from 3.5.4 to 4.0.0, then +1, why not? Thanks, Wenchen Yang Jie - Tuesday 11 March 2025 05:41:04 GMT To Sean, you're right, I'm very sorry. From the perspective of compatibility and migratability, I think we should migrate this logic to 4.0.0 and keep it in the codebase for a longer time (or permanently), because we can't predict which version users of 3.5.4 will choose next. I don't want to discuss the so-called vendor issue. I withdraw my previous -1. Jie Yang. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Adam Binford - Tuesday 11 March 2025 16:07:14 GMT +1 (non-binding) It's a pretty in the weeds issue with how Structured Streaming works under the hood that's kinda hard to understand if you're not familiar with it. The migration logic doesn't mean users can still use the old config, it's purely behind the scenes to fix checkpoint metadata in streams created in 3.5.4. The 5 lines of code it takes to address a weird edge case for certain users that's already gone from master shouldn't be a huge deal. -- Adam Binford Xiao Li - Thursday 13 March 2025 06:04:45 GMT In the history of Apache Spark, have we ever required users to upgrade to the next maintenance release before moving to a new feature or major release? Xiao Adam Binford <ad...@gmail.com> 于2025年3月11日周二 09:08写道: Russell Jurney - Thursday 13 March 2025 06:13:22 GMT I'm just a lurker and aspiring contributor, but as a Spark user upgrading twice is very confusing and would cause many or most users to fail to upgrade successfully to Spark 4 on a first go. That seems like a very bad user experience. I thought it was worthwhile stating this out loud. Russell Jungtaek Lim - Thursday 13 March 2025 06:17:55 GMT Russell, Of course, we hear people' voices who aren't having binding votes as well. Personally I think it's more important than committers/PMC members' VOTE this time since we can be biased and be far from user experience. Could you please explicitly cast your vote, like +1 (non-binding)? You seem to agree with the proposal. Thanks! Jungtaek Lim - Thursday 13 March 2025 06:39:40 GMT Dongjoon, I wonder whether I can influence you to revisit what has happened. We are here because we have two different approaches where we have agreed to disagree on the approach. We posted the discussion in dev@ because we want to hear from 3rd eyes. In DISCUSSION and VOTE threads, I'm seeing support (including non-brickster) from people about my proposal. They are users of Apache Spark, and we have a responsibility to hear from them. I'd like to see whether you can revisit this, confirming people's preference on this topic. Personally I don't think this warrants VOTE, so I can do whatever simplest and easiest and fastest way to just solve the issue, including just cancelling the VOTE if we have consensus. Hope we avoid spending time on this longer. Thanks, Jungtaek Lim (HeartSaVioR) Russell Jurney - Thursday 13 March 2025 07:08:12 GMT Sure, +1 non-binding. Yang Jie - Thursday 13 March 2025 07:40:19 GMT forgot to mention in my last reply, my stance is +1 Jie Yang --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Thursday 13 March 2025 07:54:39 GMT Thanks to everyone who participated and voted! Now I can technically conclude the VOTE, but I'm willing to wait till US daytime tomorrow, to give some time for Dongjoon to revisit this. I'll conclude the vote around 6PM PST tomorrow regardless of his vote. It's ideal to see us have no -1, but having one -1 doesn't block this vote and we can move forward. Dongjoon Hyun - Thursday 13 March 2025 14:15:32 GMT We are having this vote to give clarity by keeping all records of the community decisions and stances during building a community consensus. All votes are important and counted. To Jungtaek, I already casted my veto properly and have been tracking the thread. You don't need to say to me to revisit because I've been here. To Xiao, in the history of Apache Spark, have we ever made a mistake to ship a vendor-ownership like `spark.databricks.*`? I believe you are switching the real root cause and the bad consequence here. the next maintenance release before moving to a new feature or major release? Thanks, Dongjoon. Sean Owen - Thursday 13 March 2025 14:28:11 GMT I'm not sure if a VOTE is appropriate here, but I also do not see any valid technical objection here. I don't think this can be considered a valid 'veto' even if we were thinking of it that way. I think there are other non-technical factors influencing this position. I believe we proceed with Jungtaek's proposal. Mark Hamstra - Thursday 13 March 2025 15:20:21 GMT Valid -1 votes are not restricted to technical objections. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Thursday 13 March 2025 20:57:01 GMT I do believe there are two ways of considering -1 vote. Valid -1 votes are not restricted to technical objections, but in that case, it must not be considered as veto, otherwise we will end up disturbing ourselves. It is just an ideal world where we can make consensus on any topic, no, it can't be. Please give me the evidence if you think -1 should be considered as veto, otherwise I'll conclude the vote sooner. Jungtaek Lim - Thursday 13 March 2025 21:22:18 GMT I understand the situation of "agree to disagree", but I don't believe veto can be used like that. It's just -1 from the minority. Here are questions I want to get answered. Where is the evidence that having a vendor name in the codebase is violating ASF policy? Again, I see "Apple" to be used as a vendor name in the field name. It is definitely not used as a common noun. What's your call on this? Why do we keep saying where there is evidence and we don't see any? Why didn't you just say we must remove the migration logic the first time we talked about this (unlike you did say there are "two" approaches, link <https://github.com/apache/spark/pull/49983#issuecomment-2676531485>)? This is a major issue for me as you gave false hope that you seem to think option 1 is also a valid one, and I thought I can persuade you as long as I show you people's opinion. Why is it OK to ship the migration logic in Spark 3.5.5+ in Spark 3.5.x line if you think this is really bad? I don't think it's really a long time to make the effort of upgradability to take effect. Will we ever release Spark 3.5.20 or so? Why do you think your approach doesn't need to pass with VOTE, while in this VOTE you are the only one disagreeing with the other approach? Is it just that the current code is automatically achieving your goal? I believe this makes no sense. You should answer at least the last one, otherwise I'll have to cast -1 on Spark 4.0.0 release till this is resolved, because we have the logic in the current codebase where we do not have a consensus and I disagree with shipping this. Jungtaek Lim - Thursday 13 March 2025 21:30:57 GMT Also, I don't believe considering -1 as veto makes sense here, because his proposal is "somehow" (I'd rather say "accidentally") in the current codebase and we hadn't had any discussion with that proposal. So if we kill the VOTE and do nothing, it's effectively saying +1 to his proposal, which makes zero sense to me. Jungtaek Lim - Friday 14 March 2025 01:58:27 GMT Likewise I said, I'm concluding the VOTE since we ensure the criteria (3 +1 binding, 1 -1 binding, and also +1s from non-binding). I don't consider -1 as a veto as I explained, as we should have multiple -1s if we go for VOTE with the current codebase. (+1 in this proposal is effectively -1 in another proposal.) The vote followed the Apache Voting Process <https://www.apache.org/foundation/voting.html> with the type of "package release" (which we tend to use in dev@ for VOTE). I guess it could have also done with "procedural issues" which is less strict, but then this fulfills both types of votes which should be OK. The current codebase is "accidentally" representing another proposal and it is never intended. I don't find the way I can -1 to the current codebase, and make a different change neither bound to any proposal to be fair. I don't want to block the release because of the above. So, let's change the current codebase the way we discussed and voted here. Reverting this decision should require another VOTE. Thanks to everyone who voted! Mark Hamstra - Friday 14 March 2025 02:34:27 GMT Absolutely not! This is clearly a vote on a code change, not on a procedural issue or a package release. The code change has been vetoed by a -1 vote by a qualified voter. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Mridul Muralidharan - Friday 14 March 2025 06:31:33 GMT I agree with Mark, imo this is a qualified veto. We should give Dongjoon the opportunity to give his clarification, if any. I do realize this delays the RC process, but this deserves to be looked into carefully. Thanks, Mridul Mridul Muralidharan - Friday 14 March 2025 06:39:46 GMT FWIW, I am +1 on the proposal (though I missed the vote on this !) Regards, Mridul Dongjoon Hyun - Friday 14 March 2025 08:32:18 GMT Thank you all. The vote is finished in an intended way with the expected result. We have enough time to discuss and I have been sticking to my original technical justification from the beginning (including this). 1. Helping renaming the conf via SPARK-51172 (by approving it) 2. Banning `spark.databricks.*` via SPARK-51173 (by adding `configName` Scalastyle rule) 3. Led the discussion thread and reached the agreement to release Spark 3.5.5 early. 4. Releasing 3.5.5 as a release manager to provide a candidate migration path 5. Proposing to use the migration path This vote was Step 5. My technical point has always been aiming to recover the Apache Spark 4 codebase to the status before our mistake by containing the issue only in `branch-3.5` and providing the proposed narrow migration path. And, as mentioned already, that's the situation where we were during the vote at Apache Spark AS-IS branches. What all of us agree on is that the previous code base is okay. I didn't reply to Jungtaek's Apple comment intentionally because it's not a public Spark-vendor like Databricks. And, it's a product name of the popular consumer electronic devices like Intel/AMD/Graviton. In addition, I don't think we are going to add back `spark.databricks.*` because of the reason the customers ask for it. In the same way, this vote is one of the political decision making processes of Apache Spark PMC. We started this vote because we couldn't make a consensus. I believe I've been providing all my best to the Apache Spark community by actions and with valid technical clarification (without no modification during the process). Sincerely, Dongjoon Jungtaek Lim - Friday 14 March 2025 09:27:28 GMT Thanks for the update. Though I have to clarify that "What all of us agree on is that the previous code base is okay." is not true. Wenchen summarized what happened in other thread which I think it's more proper, like following: 1. A mistake was made, leading to a vendor name being included in the configuration released in Spark 3.5.4. 2. Dongjoon initiated a vote to deprecate the incorrect configuration name in 3.5.5, and the vote passed. Thanks to Dongjoon, 3.5.5 was released shortly after. 3. A PR <https://github.com/apache/spark/pull/49897> that simply renamed (rather than deprecated) the configuration was merged into master/4.0. This is a breaking change and was not backed by a vote. 4. This vote concerns adding migration logic to prevent the breaking change from affecting streaming queries. One thing we have to make clear is, the PR <https://github.com/apache/spark/pull/49897> that simply renamed was submitted "earlier" (Feb 12 in KST) than the PR <https://github.com/apache/spark/pull/49985> for migration (Feb 17 in KST). The former PR was even merged on Feb 12, but we definitely had a conversation to figure out the way to mitigate better at that time, because it is definitely a breaking change like Wenchen said. That said, it was just a quick fix and it warranted demanding followup work. That is why I came up with the migration logic, and I filed PR for migration logic to "3.5/4.0/master" (not only 3.5), which definitely implies I was/am intended to resolve the issue in all branches. The decision to merge "only" to 3.5 is definitely not made among "us". It was decided to merge to 3.5 among us, but no, the DISCUSSION I raised was the first time we talked about master/4.0 in public. "We never had a consensus for master/4.0", and it was me who drove the discussion for that. This is a root reason we had such a long argument, so we need to make this very clear. No, I don't think I said I'm OK with not having migration code in master/4.0. Also, I agree that Apple is not a vendor productionizing Spark, but my overall point is, we "just" feel like it's not good to have the vendor name in the codebase as the ASF project needs to try to be vendor neutral. "ASF project should be vendor neutral" is interpreted by everyone in every different way, but there is no evidence we have consensus that having a vendor name in any arbitrary string is problematic. Having a vendor name in the string doesn't mean anything except migration. No, we put it just because we think it makes users' life better. That's all, no other reason like political one, I can confidently say no, it's not. And people seem to agree based on the outcome of DISCUSS and VOTE. The main question was, "where is the evidence it's safe to force users to upgrade to Spark 3.5.5 before upgrading to Spark 4.0.0". It came up from figuring out the resolution of the issue (I get where you are coming from), but it is missing the big question about who will have a pain point, and I did ask the community and I heard they are supportive to just allow upgrading to Spark 4.0.0 directly. I really think the proposal was not backed by consensus. It was driven solely by one person - it is backed by willingness to remove the occurence of including vendor name in the codebase ASAP. Again, everyone has every different way of interpretation about the vendor name issue, so this should have been discussed before, to weigh on the cost of having the vendor name. It shouldn't be something someone solely just makes a decision by oneself. I'm not sure how this comes to a valid technical objection, because we never discussed that approach in public, and the opposite approach definitely gained traction in public. We never discussed that approach and the codebase is already reflecting this, which is arguably a disaster. I just wanted to fix that. That's all. Jungtaek Lim - Friday 14 March 2025 10:01:34 GMT If we were not intended to block the VOTE but just to express the disagreement, please say "-1" instead of representing it as "veto". When saying veto, you intend to kill the process unless you are not persuaded or you are not having proper technical justification. Jungtaek Lim - Friday 14 March 2025 10:03:13 GMT Sorry, I was missing the type of the vote - this totally depends on the type of the vote. If we weren't intended to block the VOTE which could have been interpreted as code change, maybe -0 or -0.5 or -0.99 should have been used rather than -1 to block the process. Jungtaek Lim - Friday 14 March 2025 10:23:11 GMT That said, if I understand correctly, you weren’t intended to “block” the vote, right? You say you expected the vote to be finished. Could you please cast the vote to -0.x since some people views this as code change vote, or clarify explicitly that you think this is not a code change vote? This will help resolve the concerns from some PMC members about how we should interpret the vote result clearly. Thanks! 2025년 3월 14일 (금) 오후 5:33, Dongjoon Hyun <do...@gmail.com>님이 작성: Dongjoon Hyun - Saturday 15 March 2025 00:23:21 GMT Apache Spark PMC always strongly recommends all 3.5 users to upgrade to the latest stable release via the official website. The main question seems quite different from the Apache Spark website. May I ask what is not safe to guide Spark 3.5.4 users to 3.5.5, Jungtaek? upgrade to Spark 3.5.5... For the following part, when the Apache Spark community made a mistake at Spark 2.4.2 release, we guided the users to upgrade to 2.4.3 immediately after recovering the default Scala version to 2.11. 2019-04-23 https://spark.apache.org/releases/spark-release-2-4-2.html 2019-05-08 https://spark.apache.org/releases/spark-release-2-4-3.html In the same way, Apache Spark 3.5.5 was released and is ready to handle a mistake at Spark 3.5.4. 2025-02-27 https://spark.apache.org/releases/spark-release-3-5-5.html For the vote, the vote is a time-limited procedure to make a swift decision. That's the reason why you proposed the vote procedure and we agreed. There is no way to `block` the votes. The vote itself is already completed (including my -1). I've been considering this as a part of the whole `spark.databricks.*` incident handling. In my interpretation, according to the ASF process, the Apache Spark community made the conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code with the improved Spark 4.0 migration guide because I provided a technical justification for my vote via the concrete alternative based on the existing Spark 3.5.5, AS-IS code base, and the suggested better migration guide way in order to eliminate the affected streaming queries. Thanks, Dongjoon. Jungtaek Lim - Saturday 15 March 2025 01:06:22 GMT Dongjoon, Please look at what I got from the community. It is quite different whether "we strongly recommend" vs "we force". We never have an agreement that the latter would make sense. The DISCUSSION and VOTE were to gain consensus that the latter does not make sense, and I think we get consensus except you. Again, I'm not saying upgrading Spark 3.5.4 to Spark 3.5.5 is unsafe. I'm saying, we can't force users to upgrade a specific version once to upgrade the other version. This is never the same with the argument of safeness for upgrading Spark 3.5.4 to Spark 3.5.5, because the latter implies we control users to achieve our goal. I believe I have said multiple times that we don't have control of users. I think that is a huge misunderstanding if we ever say we have control of users, so I had to prove that with DISCUSSION. The only time we are forcing users to upgrade is when we have day-0 security issues "which is really bad for users" (again, not to achieve our goal), but even for that case, we provide upgrade paths for every active branch whenever it is feasible. This is a huge difference from what you were proposing. I understand the disagreement of this led to a long standing discussion/debate, and I hope I showed what the community thinks about it, so that we don't have to do the same discussion again. I really want to remind everyone, any trial to control users wouldn't lead to a better result, so we should have weighed the tradeoff in public before doing that. It's not successful even if users do not complain explicitly. They just accumulate reasons by themselves to stop using Apache Spark. Mark / Mrudul, Dongjoon agreed that this is a procedural vote, so -1 is not a "veto". I'll update the result thread. Hope we now resolve the concern about the vote. Jungtaek Lim - Saturday 15 March 2025 14:18:41 GMT conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code with the improved Spark 4.0 migration guide because I provided a technical justification for my vote via the concrete alternative based on the existing Spark 3.5.5, AS-IS code base, and the suggested better migration guide way in order to eliminate the affected streaming queries. I can always be corrected if you give the evidence. Let's stop "just" talking. I believe we are seeing quite different things and our memory is quite opposite. "History will tell us." I am trying to understand where the miscommunication came from. Some clarification: 1. I believe I have said I do not agree just removing the config in master/4.0 and I expected follow-up, which is the migration logic. (link <https://github.com/apache/spark/pull/49897#issuecomment-2652486115>) I admit this is a bit unclear to understand, but I had multiple times to make my voice clear, otherwise I shouldn't ever have migration logic PR for master/4.0. 2. I believe I have said my intention is to land the migration logic to 4.0.x and arguably longer (link <https://lists.apache.org/thread/q24vonqhvqh11ghd488rctsm89zvmpqd>). I think there were people who were wanting to remove the vendor name in any way, but arguably it just ended with an open question, never to be reached consensus. People expressed concerns, but nothing was concluded except we agree with proceeding for Spark 3.5.5. We never made a consensus on how to deal with it in Spark 4.0.0+ in that discussion thread, especially about migration logic. 3. VOTE for removal of config is clearly stated that it is only 3.5. (link <https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5>) 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR, and hadn't got any feedback except being told to wait for Spark 3.5.5 (link). If you weren't open to my proposal, you should have just said "we were already decided" and you had to give the evidence. I haven't heard any, so I had to initiate DISCUSS. 5. We all know about DISCUSS and VOTE so I wouldn't repeat. I have strong evidence that you were aware of the fact we never agreed with the behavior for Spark 4.0.0, and you said my proposal is "technically correct", so we had never debated about "technical objection", but debated about "behavior". https://github.com/apache/spark/pull/49983#issuecomment-2676531485 Can you please explain why you said my proposal is "technically correct" and here you did a vote which required "technical objection"? Have you changed your mind? Overall, when you say "the Apache Spark community made the conclusion to unblock the Apache Spark 4.0.0 release with the AS-IS code", I don't get who is "the Apache Spark community". Where can I see the DISCUSS and VOTE thread? Is it really that I am excluded on the list of the Apache Spark community, while arguably I am the only active maintainer of the module? Could you please enumerate who the Apache Spark community was at that time? Let's not talk based on memory. If we agree about that, we should have a history. I am open to apologize if I missed a critical discussion and vote. Your (and my) memory should never be used as evidence. Please, give the evidence. I'm also happy to hear about the other thread I have made. Thanks. Jungtaek Lim - Saturday 15 March 2025 14:21:23 GMT small missing on link: 4. I claimed I wanted to proceed with migration logic for branch-4.0 PR, and hadn't got any feedback except being told to wait for Spark 3.5.5 (link <https://github.com/apache/spark/pull/49983#pullrequestreview-2621947671>). If you weren't open to my proposal, you should have just said "we were already decided" and you had to give the evidence. I haven't heard any, so I had to initiate DISCUSS. Jungtaek Lim - Saturday 15 March 2025 14:26:48 GMT Dongjoon, it is your responsibility to clarify your vote position since the vote is stalled as some people still claim your vote is veto. If you are really agreeing that I gained the consensus in the proper way, and your vote is really just for historical record, let's not waste more time by explicitly casting -0.99. If you say I misunderstood your reply and you are still casting a veto, I'm happy to hear the evidence based on the history. We only talked from Github PR and mailing list, so none of the discussion happened except that infrastructure. I do not count any discussion happening in private@, as private@ is not meant to be used for discussion which could have been done in public. Jungtaek Lim [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x The vote passes with 7 +1s (3 binding +1s) and 1 -1s (1 binding -1s). Thanks to all who helped with the vote! I'm going to make a code change in branch-4.0 quickly so that we don't have to trigger an... 4 182025-03-15 13:19 +00:00Rob Reeves Code formatting tech debt Hi Spark devs, There seems to be a lot of code formatting tech debt. When I run "./dev/scalafmt" on the master branch it makes formatting changes on thousands of files. Is that expected or am I doing... 2 12025-03-15 07:08 +00:00Qi Tan [Discuss] SPIP: Support NanoSecond Timestamps Hello everybody, I would like to start a discussion on SPARK-50532 <https://issues.apache.org/jira/browse/SPARK-50532> to enable Spark to support nanoseconds. Here attached the spip doc <https://docs... 5 52025-03-15 06:25 +00:00Anton Okolnychyi Re: [DISCUSS] SPIP: Constraints in DSv2 Thanks to everyone who provided feedback and participated in the discussion! I made some tweaks to the proposal and submitted a PR with the DSv2 API changes: https://github.com/apache/spark/pull/5025... 1 02025-03-13 00:49 +00:00Hyukjin Kwon Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests If we should fix, let's make sure we don't just disable the tests - we will create another set of technical debt. On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid> wrote: > I’ll... 3 22025-03-12 16:14 +00:00Rozov, Vlad [DISCUSS] Upgrade Hive compile time dependency to 4.0 Hi All, As Apache Hive announced EOL for Hive 2.x [1] and 3.x [2], should Spark be compiled against Hive 4.x and use it as default? Thank you, Vlad [1] https://lists.apache.org/thread/4ctrzfw60jkh... 3 32025-03-12 14:49 +00:00Rozov, Vlad Re: PR review May I please get review on the following outstanding PRs: https://github.com/apache/spark/pull/49276 (open on 12/23/2024) https://github.com/apache/spark/pull/49870 Thank you, Vlad On Feb... 1 02025-03-11 15:58 +00:00Jungtaek Lim [DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in Spark 4.0.0+ Hi dev, This is a spin-up of the original thread "Deprecating and banning `spark.databricks.*` config from Apache Spark repository". (link <https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0nd... 8 262025-03-11 09:14 +00:00Wenchen Fan [VOTE] Release Spark 4.0.0 (RC2) Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until March 5 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ]... 7 112025-03-11 02:35 +00:00Dongjoon Hyun [DISCUSS] New Spark Connect Client repository for Swift language Hi, All. I'd like to propose to add a new Apache Spark repository for `Spark Connect Client for Swift` in Apache Spark 4.1.0 timeframe. https://github.com/apache/spark-connect-swift To do this, I c... 17 222025-03-11 02:27 +00:00dongjoon-hyun (via GitHub) [PR] [SPARK-51458] Add GitHub Action job to check ASF license [spark-connect-swift] dongjoon-hyun opened a new pull request, #2: URL: https://github.com/apache/spark-connect-swift/pull/2 (no comment) -- This is an automated message from the Apache Git Service. To respond to t... 1 12025-03-11 00:03 +00:00dongjoon-hyun (via GitHub) [PR] Initial Implementation [spark-connect-swift] dongjoon-hyun opened a new pull request, #1: URL: https://github.com/apache/spark-connect-swift/pull/1 (no comment) -- This is an automated message from the Apache Git Service. To respond to t... 1 02025-03-10 23:45 +00:00Max Gekk Contribution to Spark SQL: new data type TIME Hi Spark devs, I would like to invite you to develop the new data type TIME in Spark SQL. At the moment, there are > 10 sub-tasks in the umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-511... 3 22025-03-06 17:44 +00:00 Powered by Apache Pony Mail (Foal v/1.0.1 ~78ad7bf). For data privacy requests, please contact: priv...@apache.org. For questions about this service, please contact: us...@infra.apache.org.
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org