I think I was too rushed to read and focused on the first sentence of Karen's input. Sorry about that.
As I said I'm not sure I can agree with the point of deprecation and breaking changes of APIs, the thread has another topic which seems to be a good input - practice on new API proposal. I feel it should be different thread to discuss, though. Maybe we can make the deprecation of API as "heavy-weight" operation to mitigate the impact a bit, like requiring discussion thread to reach consensus before going through PR. For now, you have no idea which API is going to be deprecated and why if you only subscribe to dev@. Even you subscribe the issue@ you would miss it among flooded issues. Personally I feel the root cause as dev@ is very quiet compared to the volume of PRs the community gets and the impacts of changes these PRs have been made. I agree we should have balance on this to avoid restricting ourselves too much, but I feel there's no balance now - most things are just going through PRs without discussion. It would be ideal we have time to consider on this. On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Apache Spark 2.0 was released in July 2016. Assuming the project has been > trying the best to follow the semantic versioning, it is "more than three > years" to wait for the breaking changes. What the community misses to > address necessary breaking changes would be going to be technical debts for > another 3+ years. > > As the PRs removing deprecated APIs were pointed out first, I'm not sure > about the reason. I roughly remember that these PRs target to remove > deprecated APIs deprecated at couple of minor versions before. If then > what's the matter? > > If the deprecation messages don't kindly guide about alternatives then > that's the major problem the community should concern and try to fix, but > that's another problem. The community doesn't deprecate the API just for > fun. Every deprecation has the reason, and not removing the API doesn't > make sense unless the community has mistaken for a reason of deprecation. > > If the community really would like to build some (soft) rules/policies on > deprecation, I would only imagine 2 items - > > 1. define "minimum release to live" (either each deprecated API or > globally) > 2. never skip describing the reason of deprecation and try best to > describe alternative works same or similar - if the alternative doesn't > work exactly same, also describe the difference (optionally, maybe) > > I cannot imagine other problems at all about deprecation. > > On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Sure. I understand the background of the following requests. So, it's a >> good time to decide the criteria in order to start discussion. >> >> 1. "to provide a reasonable migration path we’d want the replacement >> of the deprecated API to also exist in 2.4" >> 2. "We need to discuss the APIs case by case" >> >> For now, it's unclear what is `necessarily painful`, what is "widely used >> APIs", or how small is "the maintenance costs are small". >> >> I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward >> compatible with Apache Spark 2.4.5 like Apache Kafka? >> Are we going to revert all changes? If there is a clear criteria, we >> didn't need to do the clean up for that long period of 3.0.0. >> >> BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in >> this thread. >> >> Bests, >> Dongjoon. >> >> >> On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <lix...@databricks.com> wrote: >> >>> Like https://github.com/apache/spark/pull/23131, we added back >>> unionAll. >>> >>> We might need to double check whether we removed some widely used APIs >>> in this release before RC. If the maintenance costs are small, keeping some >>> deprecated APIs look reasonable to me. This can help the adoption of Spark >>> 3.0. We need to discuss the APIs case by case. >>> >>> Xiao >>> >>> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <hol...@pigscanfly.ca> >>> wrote: >>> >>>> So my understanding would be that to provide a reasonable migration >>>> path we’d want the replacement of the deprecated API to also exist in 2.4 >>>> this way libraries and programs can dual target during the migration >>>> process. >>>> >>>> Now that isn’t always going to be doable, but certainly worth looking >>>> at the situations where we aren’t providing a smooth migration path and >>>> making sure it’s the best thing to do. >>>> >>>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>> wrote: >>>> >>>>> Hi, Karen. >>>>> >>>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs? >>>>> Could you tell us what is your criteria for `unnecessarily` or >>>>> `necessarily`? >>>>> >>>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful. >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>> >>>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <karen.f...@databricks.com> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well >>>>>> as >>>>>> SPARK-16775, and potentially others) will make the migration process >>>>>> from >>>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of >>>>>> SQLContext.getOrCreate will break a large number of libraries >>>>>> currently >>>>>> built on Spark 2. >>>>>> >>>>>> Even if library developers do not use deprecated APIs, API changes >>>>>> between >>>>>> 2.x and 3.x will result in inconsistencies that require hacking >>>>>> around. For >>>>>> a fairly small and new (2.4.3+) genomics library, I had to create a >>>>>> number >>>>>> of shims (https://github.com/projectglow/glow/pull/155) for the >>>>>> source and >>>>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744. >>>>>> >>>>>> It would be best practice to avoid breaking existing APIs to ease >>>>>> library >>>>>> development. To avoid dealing with similar deprecated API issues down >>>>>> the >>>>>> road, we should practice more prudence when considering new API >>>>>> proposals. >>>>>> >>>>>> I'd love to see more discussion on this. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >>>>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>> >>> >>> -- >>> <https://databricks.com/sparkaisummit/north-america> >>> >>