-1 Remove migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.0 because I think this configuration was initially introduced accidentally in Spark 3.5.4, lacking a clear design intent. Although the immediate maintenance cost of retaining this configuration currently seems limited, as subsequent versions iterate and user habits form, it may lead to the continuous accumulation of technical debt. When users come to view this configuration as one that can be relied on long-term, future removal may face greater resistance from users and could potentially become an entrenched and redundant configuration in the codebase. Therefore, promptly correcting this historically accidental configuration not only maintains the normativity of the Spark configuration system but also prevents unintended configurations from becoming de facto standards, thereby reducing long-term maintenance risks.
Jie Yang On 2025/03/10 14:52:52 Dongjoon Hyun wrote: > -1 because there exists a feasible migration path for Apache Spark 3.5.4 via > Apache Spark 3.5.5. > > It's obvious that this Databricks' mistake already causes a huge > communication cost in the Apache Spark community and is suggesting a burden > to enforce us to handle at least two more PRs at 4.0.0 and 4.1.0. > > Given that, I don't think > - This is an inevitable or > - This is 0 cost > > Dongjoon. > > On 2025/03/10 12:46:16 Jungtaek Lim wrote: > > Starting from my +1 (non-binding). > > > > In addition, I propose to retain migration logic till Spark 4.1.x and > > remove it in Spark 4.2.0. > > > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > > wrote: > > > > > Hi dev, > > > > > > Please vote to retain migration logic of incorrect `spark.databricks.*` > > > configuration in Spark 4.0.x. > > > > > > - DISCUSSION: > > > https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr > > > ([DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in > > > Spark 4.0.0+) > > > > > > Specifically, please review this post > > > https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k which > > > explains pros and cons about the proposal - proposal is about "Option 1". > > > > > > Simply speaking, this vote is to allow streaming queries which had been > > > ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, "without having > > > to > > > be upgraded with Spark 3.5.5+ in prior". If the vote passes, we will help > > > users to have a smooth upgrade from Spark 3.5.4 to Spark 4.0.x, which > > > would > > > be almost 1 year. > > > > > > The (only) cons in this option is having to retain the incorrect > > > configuration name as "string" in the codebase a bit longer. The code > > > complexity of migration logic is arguably trivial. (link > > > <https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183> > > > ) > > > > > > This VOTE is for Spark 4.0.x, but if someone supports including migration > > > logic to be longer than Spark 4.0.x, please cast +1 here and leave the > > > desired last minor version of Spark to retain this migration logic. > > > > > > The vote is open for the next 72 hours and passes if a majority +1 PMC > > > votes are cast, with a minimum of 3 +1 votes. > > > > > > [ ] +1 Retain migration logic of incorrect `spark.databricks.*` > > > configuration in Spark 4.0.x > > > [ ] -1 Remove migration logic of incorrect `spark.databricks.*` > > > configuration in Spark 4.0.0 because... > > > > > > Thanks! > > > Jungtaek Lim (HeartSaVioR) > > > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org