I think I can add a color to minimize the concern. The problematic config we added is arguably not user facing. I'd argue moderate users wouldn't even understand what the flag is doing. The config was added because Structured Streaming has been leveraging SQL config to "do the magic" on having two different default values for new query vs old query (checkpoint is created from the version where the fix is not landed). This is purely used for backward compatibility, not something we want to give users flexibility.
That said, I don't see a risk of removing config "at any point". (I'd even say removing this config in Spark 3.5.5 does not change anything. The reason I'm not removing the config in 3.5 (and yet to 4.0/master) is just to address any concern on being conservative.) I think you are worrying about case 1 from my comment. From my new change ( link <https://github.com/apache/spark/pull/49983>), I made a migration logic when the offset log contains the problematic configuration - we will take the value, but put the value to the new config, and at the next microbatch planning, the offset log will contain the new configuration going forward. This addresses the case 1, as long as we retain the migration logic for a couple minor releases (say, 4.2 or so). We just need to support this migration logic for the time where we never thought of jumping directly from Spark 3.5.4 to the version. Hope this helps to address your concern/worrying. On Tue, Feb 18, 2025 at 7:40 AM Bjørn Jørgensen <bjornjorgen...@gmail.com> wrote: > > Having breaking changes in a minor seems not that good.. As I'm reading > this, > > "*This could break the query if the rule impacts the query, because the > effectiveness of the fix is flipped.*" > https://github.com/apache/spark/pull/49897#issuecomment-2652567140 > > > What if we have this https://github.com/apache/spark/pull/48149 change in > the branch and remove it only for version 4? That way we dont break > anything. > > > > > man. 17. feb. 2025 kl. 23:03 skrev Dongjoon Hyun <dongjoon.h...@gmail.com > >: > >> Hi, All. >> >> I'd like to highlight this discussion because this is more important and >> tricky in a way. >> >> As already mentioned in the mailing list and PRs, there was an obvious >> mistake >> which missed an improper configuration name, `spark.databricks.*`. >> >> >> https://github.com/apache/spark/blob/a6f220d951742f4074b37772485ee0ec7a774e7d/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3424 >> >> `spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` >> >> In fact, Apache Spark committers have been preventing this repetitive >> mistake >> pattern during the review stages successfully until we slip the following >> backportings >> at Apache Spark 3.5.4. >> >> https://github.com/apache/spark/pull/45649 >> https://github.com/apache/spark/pull/48149 >> https://github.com/apache/spark/pull/49121 >> >> At this point of writing, `spark.databricks.*` was removed successfully >> from `master` >> and `branch-4.0` and a new ScalaStyle rule was added to protect Apache >> Spark repository >> from future mistakes. >> >> SPARK-51172 Rename to >> spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan >> SPARK-51173 Add `configName` Scalastyle rule >> >> What I proposed is to release Apache Spark 3.5.5 next week with the >> deprecation >> in order to make Apache Spark 4.0 be free of `spark.databricks.*` >> configuration. >> >> Apache Spark 3.5.5 (2025 February, with deprecation warning with >> alternative) >> Apache Spark 4.0.0 (2025 March, without `spark.databricks.*` config) >> >> In addition, I'd like to volunteer as a release manager of Apache Spark >> 3.5.5 >> for a swift release. WDYT? >> >> FYI, `branch-3.5` has 37 patches currently. >> >> $ git log --oneline v3.5.4..HEAD | wc -l >> 37 >> >> Best Regards, >> Dongjoon. >> > > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 >