Thank you for focusing on this, Mark. I also agree with you that this should be decided by the Apache Spark PMC and appreciate the effort to help us move forward in the Apache way.
As you mentioned, there is no ASF policy. That's true. > I am not aware of any ASF policy that strictly forbids the mention of a vendor > in Apache code for any reason Let's imagine that Apache Spark project starts to support the following existing vendor `spark.databricks.*` configs to help Spark users migrate (or offload) from Databricks service to open source Spark cluster easily. - spark.databricks.cluster.profile - spark.databricks.io.cache.enabled - spark.databricks.delta.optimizeWrite.enabled - spark.databricks.passthrough.enabled Some users or developers may claim that there is a clear and huge benefit. I must agree with them because it's also true. However, is this a way where Apache Spark project aims to go? I cannot agree with that. It's very bad for Apache Spark distribution to support `spark.databricks.*` namespace like Spark 3.5.4 because it's misleading to the Apache Spark users by dilluting the boundary of Apache Spark distribution (and Brand) and the commercial vendor products (and Brand). Note that Apache Spark 3.5.5 and all future 3.5.x also support `spark.databricks.*` until April 2025 (the end-of-life) because the deprecation is not a deletion nor a ban. The incident at 3.5.4 was something that should never have happened. It causes many confusions already and will more sadly. The confusion is contagious not only for the distribution, but also for the source code. I guess - The original Databricks contributor was confused what he contribute, maybe. - The Apache Spark committer (Jungtaek) overlooked what we should not approve because the code resembles his internal company repo. - The downstream Apache Spark fork repositories consume `spark.databricks.*` namespace as Apache Spark's namespace. For me, it's more misleading to dillute the boundary of Apache Spark code and the comercial vendor code. I have been working on this issue and considering this vote as the last piece of the overall handling of the 'spark.databricks.*' incident because I believe we are establishing a new rule for the Apache Spark community for the future. This will serve as a precedent for handling similar incidents in the future. Please let me re-summarize the past steps I did with the community: 1. Helping renaming the conf via SPARK-51172 (by approving it) 2. Banning `spark.databricks.*` via SPARK-51173 (by adding `configName` Scalastyle rule) 3. Led the discussion thread "Deprecating and banning `spark.databricks.*` config from Apache Spark repository" https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0ndfw2jd 4. Reached the agreement to release Spark 3.5.5 early. [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5 5. Releasing 3.5.5 as a release manager to provide a candidate migration path 6. Proposing for 3.5.4 users to use 3.5.5 as the migration path I proposed to document on the migration guide of Spark 4.0 (Step 6) because that is the only way to handle this incident without using the specific vendor config name again in Spark code of `master` and `branch-4.0` branch. As you read in Step 2 in the above, I prefer the automatic way. The documentation-only solution never be my personal preference. It was the lesser of two evils. Let me re-iterate this. Although we succeeded to deprecate the configuration early, the contaminated release branch `branch-3.5` and its releases still support the configuration to the Spark jobs until April 2026 (the end-of-life). It's a long-standing live incident which is happening currently. For the vote, "Retain ... in Spark 4.0.x", I casted -1 because it aims to introduce the vendor configuration name (string) back to Apache Spark 4 code. It means another contaminated `branch-4.0` will blur the boundary. On top of that, the Databricks' Apache Spark committer (Jungtaek), who caused this incident by merging `spark.databricks.*` code, set a trap on this vote by writing the following when he initiated the vote. > if someone supports including migration logic to be longer than Spark 4.0.x, > please cast +1 here and leave the desired last minor version of Spark to > retain this migration logic. At the same time, he casted +1 with the following. > Starting from my +1 (non-binding). > In addition, I propose to retain migration logic till Spark 4.1.x and remove > it in Spark 4.2.0. In the open source community, he is playing his own card trick by flipping the vote title under everyone's nose like a magic. > [VOTE] Retain migration logic of incorrect `spark.databricks.*` config in > Spark 4.0.x > [VOTE] Retain migration logic of incorrect `spark.databricks.*` config in > Spark 4.0.x/4.1.x In other words, Jungtaek is trying to spread this terrible and misleading situation to the end of life of Spark 4.1.0 (Spring 2027) for now. I can say that he will extend it again by ignoring the removal at Spark 4.2+ with the same reasons like the following. - We usually don't introduce the breaking behavior under the same major version. - The maintenance cost is near zero. In this case, it will be permanent under Spark 4 (~ 2030?) Of course, someone might say that it's better than `branch-3.5` situation because the migration code is a read-only support. However, it's still in the same category which misleads the community to the confusion where Apache Spark supports `spark.databricks.*` configurations. The vote was articulated to cause a longer and bigger side-effect because `branch-4.0` and `branch-4.1` has a longer period and many releases in total. To prevent the breakout of contagious `spark.databricks.*` situations, we should stop now and protect `branch-4.0`. The side-effect and implication is huge. Apache Spark 4.0.0 is the only version we can stop this spreading. So, documentation only is the only feasible way to choose. So, -1 (= The technical justification for the veto is valid) Sincerely, Dongjoon. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org