I don't think there is a reason to keep it at 4.0.0 (and forever?) if we release Spark 3.5.5 with the proper deprecation. This is a big difference, Wenchen.
And, the difference is the main reason why I initiated this thread to sugguest to remove 'spark.databricks.*' completely from Apache Spark 4 via volunteering Spark 3.5.5 release manager. Sincerely, Dongjoon On Mon, Feb 17, 2025 at 22:59 Wenchen Fan <cloud0...@gmail.com> wrote: > It’s unfortunate that we missed identifying these issues during the code > review. However, since they have already been released, I believe > deprecating them is a better approach than removing them, as the latter > would introduce a breaking change. > > Regarding Jungtaek’s PR <https://github.com/apache/spark/pull/49983>, it > looks like there are only a few lines of migration code. Would it be > acceptable to leave them for legacy support? With the new config name style > check rule in place, such issues should not occur again in the future. > > On Tue, Feb 18, 2025 at 9:00 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> I think I can add a color to minimize the concern. >> >> The problematic config we added is arguably not user facing. I'd argue >> moderate users wouldn't even understand what the flag is doing. The config >> was added because Structured Streaming has been leveraging SQL config to >> "do the magic" on having two different default values for new query vs old >> query (checkpoint is created from the version where the fix is not landed). >> This is purely used for backward compatibility, not something we want to >> give users flexibility. >> >> That said, I don't see a risk of removing config "at any point". (I'd >> even say removing this config in Spark 3.5.5 does not change anything. The >> reason I'm not removing the config in 3.5 (and yet to 4.0/master) is just >> to address any concern on being conservative.) >> >> I think you are worrying about case 1 from my comment. From my new change >> (link <https://github.com/apache/spark/pull/49983>), I made a migration >> logic when the offset log contains the problematic configuration - we will >> take the value, but put the value to the new config, and at the next >> microbatch planning, the offset log will contain the new configuration >> going forward. This addresses the case 1, as long as we retain the >> migration logic for a couple minor releases (say, 4.2 or so). We just need >> to support this migration logic for the time where we never thought of >> jumping directly from Spark 3.5.4 to the version. >> >> Hope this helps to address your concern/worrying. >> >> >> On Tue, Feb 18, 2025 at 7:40 AM Bjørn Jørgensen <bjornjorgen...@gmail.com> >> wrote: >> >>> >>> Having breaking changes in a minor seems not that good.. As I'm reading >>> this, >>> >>> "*This could break the query if the rule impacts the query, because the >>> effectiveness of the fix is flipped.*" >>> https://github.com/apache/spark/pull/49897#issuecomment-2652567140 >>> >>> >>> What if we have this https://github.com/apache/spark/pull/48149 change >>> in the branch and remove it only for version 4? That way we dont break >>> anything. >>> >>> >>> >>> >>> man. 17. feb. 2025 kl. 23:03 skrev Dongjoon Hyun < >>> dongjoon.h...@gmail.com>: >>> >>>> Hi, All. >>>> >>>> I'd like to highlight this discussion because this is more important >>>> and tricky in a way. >>>> >>>> As already mentioned in the mailing list and PRs, there was an obvious >>>> mistake >>>> which missed an improper configuration name, `spark.databricks.*`. >>>> >>>> >>>> https://github.com/apache/spark/blob/a6f220d951742f4074b37772485ee0ec7a774e7d/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3424 >>>> >>>> `spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` >>>> >>>> In fact, Apache Spark committers have been preventing this repetitive >>>> mistake >>>> pattern during the review stages successfully until we slip the >>>> following backportings >>>> at Apache Spark 3.5.4. >>>> >>>> https://github.com/apache/spark/pull/45649 >>>> https://github.com/apache/spark/pull/48149 >>>> https://github.com/apache/spark/pull/49121 >>>> >>>> At this point of writing, `spark.databricks.*` was removed successfully >>>> from `master` >>>> and `branch-4.0` and a new ScalaStyle rule was added to protect Apache >>>> Spark repository >>>> from future mistakes. >>>> >>>> SPARK-51172 Rename to >>>> spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan >>>> SPARK-51173 Add `configName` Scalastyle rule >>>> >>>> What I proposed is to release Apache Spark 3.5.5 next week with the >>>> deprecation >>>> in order to make Apache Spark 4.0 be free of `spark.databricks.*` >>>> configuration. >>>> >>>> Apache Spark 3.5.5 (2025 February, with deprecation warning with >>>> alternative) >>>> Apache Spark 4.0.0 (2025 March, without `spark.databricks.*` config) >>>> >>>> In addition, I'd like to volunteer as a release manager of Apache Spark >>>> 3.5.5 >>>> for a swift release. WDYT? >>>> >>>> FYI, `branch-3.5` has 37 patches currently. >>>> >>>> $ git log --oneline v3.5.4..HEAD | wc -l >>>> 37 >>>> >>>> Best Regards, >>>> Dongjoon. >>>> >>> >>> >>> -- >>> Bjørn Jørgensen >>> Vestre Aspehaug 4, 6010 Ålesund >>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++Norge?entry=gmail&source=g> >>> Norge >>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++Norge?entry=gmail&source=g> >>> >>> +47 480 94 297 >>> >>