+1 (non-binding) On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer <heue...@gmail.com> wrote:
> +1 (non-binding) > > I am disappointed however that this only mentions API and not dependencies > and transitive dependencies. > > As Spark does not provide separation between its runtime classpath and the > classpath used by applications, I believe Spark's dependencies and > transitive dependencies should be considered part of the API for this > policy. Breaking dependency upgrades and incompatible dependency versions > are the source of much frustration. > > michael > > > On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ues...@happy-camper.st> wrote: > > +1 (binding) > > > On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com> > wrote: > >> +1 (non-binding) >> >> Cheers, >> >> Xingbo >> >> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com> wrote: >> >>> +1 (binding) >>> >>> Xiao >>> >>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com> wrote: >>> >>>> +1 (non-binding) >>>> >>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> >>>> wrote: >>>> >>>>> The proposal itself seems good as the factors to consider, Thanks >>>>> Michael. >>>>> >>>>> Several concerns mentioned look good points, in particular: >>>>> >>>>> > ... assuming that this is for public stable APIs, not APIs that are >>>>> marked as unstable, evolving, etc. ... >>>>> I would like to confirm this. We already have API annotations such as >>>>> Experimental, Unstable, etc. and the implication of each is still >>>>> effective. If it's for stable APIs, it makes sense to me as well. >>>>> >>>>> > ... can we expand on 'when' an API change can occur ? Since we are >>>>> proposing to diverge from semver. ... >>>>> I think this is a good point. If we're proposing to divert >>>>> from semver, the delta compared to semver will have to be clarified to >>>>> avoid different personal interpretations of the somewhat general >>>>> principles. >>>>> >>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >>>>> Apache Spark 3.0+? ... >>>>> >>>>> Assuming these concerns will be addressed, +1 (binding). >>>>> >>>>> >>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 >>>>> 작성: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> Bests, >>>>>> Takeshi >>>>>> >>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>>>>> gengliang.w...@databricks.com> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Gengliang >>>>>>> >>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia < >>>>>>> matei.zaha...@gmail.com> wrote: >>>>>>> >>>>>>>> +1 as well. >>>>>>>> >>>>>>>> Matei >>>>>>>> >>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> +1 (binding), assuming that this is for public stable APIs, not >>>>>>>> APIs that are marked as unstable, evolving, etc. >>>>>>>> >>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Michael's section on the trade-offs of maintaining / removing an >>>>>>>>> API are one of >>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>>>>> >>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < >>>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>>> > >>>>>>>>> > This new policy has a good indention, but can we narrow down on >>>>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>>>>> > >>>>>>>>> > I saw that there already exists a reverting PR to bring back >>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>>>>>> > >>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>>>>> difficulty, and it's nice. >>>>>>>>> > >>>>>>>>> > However, for the other cases, it sounds like `recommending older >>>>>>>>> APIs as much as possible` due to the following. >>>>>>>>> > >>>>>>>>> > > How long has the API been in Spark? >>>>>>>>> > >>>>>>>>> > We had better be more careful when we add a new policy and >>>>>>>>> should aim not to mislead the users and 3rd party library developers >>>>>>>>> to say >>>>>>>>> "older is better". >>>>>>>>> > >>>>>>>>> > Technically, I'm wondering who will use new APIs in their >>>>>>>>> examples (of books and StackOverflow) if they need to write an >>>>>>>>> additional >>>>>>>>> warning like `this only works at 2.4.0+` always . >>>>>>>>> > >>>>>>>>> > Bests, >>>>>>>>> > Dongjoon. >>>>>>>>> > >>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < >>>>>>>>> mri...@gmail.com> wrote: >>>>>>>>> >> >>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>>>>> prefer >>>>>>>>> >> stable well designed API's :-) >>>>>>>>> >> >>>>>>>>> >> Can we tie the proposal to stability guarantees given by spark >>>>>>>>> and >>>>>>>>> >> reasonable expectation from users ? >>>>>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>>>>> >> experimental api which has been around for ages should be more >>>>>>>>> >> conservatively handled. >>>>>>>>> >> Which brings in question what are the stability guarantees as >>>>>>>>> >> specified by annotations interacting with the proposal. >>>>>>>>> >> >>>>>>>>> >> Also, can we expand on 'when' an API change can occur ? Since >>>>>>>>> we are >>>>>>>>> >> proposing to diverge from semver. >>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>>>>> 'impact' >>>>>>>>> >> of API ? Stability guarantees ? >>>>>>>>> >> >>>>>>>>> >> Regards, >>>>>>>>> >> Mridul >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>> >> > >>>>>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>>>>> >> > >>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>> >> >> >>>>>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>>>>> Versioning policy and adopt it as the rubric that should be used when >>>>>>>>> deciding to break APIs (even at major versions such as 3.0). >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. >>>>>>>>> As this is a procedural vote, the measure will pass if there are more >>>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but >>>>>>>>> the >>>>>>>>> community is encouraged to add their voice to the discussion. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>>>>> >> >> >>>>>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> <new policy> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> Considerations When Breaking APIs >>>>>>>>> >> >> >>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or silently >>>>>>>>> changing behavior, even at major versions. While this is not always >>>>>>>>> possible, the balance of the following factors should be considered >>>>>>>>> before >>>>>>>>> choosing to break an API. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> Cost of Breaking an API >>>>>>>>> >> >> >>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>>>>>> users of Spark. A broken API means that Spark programs need to be >>>>>>>>> rewritten >>>>>>>>> before they can be upgraded. However, there are a few considerations >>>>>>>>> when >>>>>>>>> thinking about what the cost will be: >>>>>>>>> >> >> >>>>>>>>> >> >> Usage - an API that is actively used in many different >>>>>>>>> places, is always very costly to break. While it is hard to know >>>>>>>>> usage for >>>>>>>>> sure, there are a bunch of ways that we can estimate: >>>>>>>>> >> >> >>>>>>>>> >> >> How long has the API been in Spark? >>>>>>>>> >> >> >>>>>>>>> >> >> Is the API common even for basic programs? >>>>>>>>> >> >> >>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing >>>>>>>>> lists? >>>>>>>>> >> >> >>>>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>>>> >> >> >>>>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>>>> today, work after the break? The following are listed roughly in >>>>>>>>> order of >>>>>>>>> increasing severity: >>>>>>>>> >> >> >>>>>>>>> >> >> Will there be a compiler or linker error? >>>>>>>>> >> >> >>>>>>>>> >> >> Will there be a runtime exception? >>>>>>>>> >> >> >>>>>>>>> >> >> Will that exception happen after significant processing has >>>>>>>>> been done? >>>>>>>>> >> >> >>>>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>>>> debug, might not even notice!) >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> Cost of Maintaining an API >>>>>>>>> >> >> >>>>>>>>> >> >> Of course, the above does not mean that we will never break >>>>>>>>> any APIs. We must also consider the cost both to the project and to >>>>>>>>> our >>>>>>>>> users of keeping the API in question. >>>>>>>>> >> >> >>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and >>>>>>>>> needs to keep working as other parts of the project changes. These >>>>>>>>> costs >>>>>>>>> are significantly exacerbated when external dependencies change (the >>>>>>>>> JVM, >>>>>>>>> Scala, etc). In some cases, while not completely technically >>>>>>>>> infeasible, >>>>>>>>> the cost of maintaining a particular API can become too high. >>>>>>>>> >> >> >>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users >>>>>>>>> learning Spark or trying to understand Spark programs. This cost >>>>>>>>> becomes >>>>>>>>> even higher when the API in question has confusing or undefined >>>>>>>>> semantics. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> Alternatives to Breaking an API >>>>>>>>> >> >> >>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>>>> removal is also high, there are alternatives that should be >>>>>>>>> considered that >>>>>>>>> do not hurt existing users but do address some of the maintenance >>>>>>>>> costs. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>>>> important point. Anytime we are adding a new interface to Spark we >>>>>>>>> should >>>>>>>>> consider that we might be stuck with this API forever. Think deeply >>>>>>>>> about >>>>>>>>> how new APIs relate to existing ones, as well as how you expect them >>>>>>>>> to >>>>>>>>> evolve over time. >>>>>>>>> >> >> >>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should point >>>>>>>>> to a clear alternative and should never just say that an API is >>>>>>>>> deprecated. >>>>>>>>> >> >> >>>>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>>>> recommended way of performing a given task. In the cases where we >>>>>>>>> maintain >>>>>>>>> legacy documentation, we should clearly point to newer APIs and >>>>>>>>> suggest to >>>>>>>>> users the "right" way. >>>>>>>>> >> >> >>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs >>>>>>>>> and other sites such as StackOverflow. However, many of these >>>>>>>>> resources are >>>>>>>>> out of date. Update them, to reduce the cost of eventually removing >>>>>>>>> deprecated APIs. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> </new policy> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>> >> >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> --- >>>>>> Takeshi Yamamuro >>>>>> >>>>> >>> >>> -- >>> <https://databricks.com/sparkaisummit/north-america> >>> >> > > -- > Takuya UESHIN > > http://twitter.com/ueshin > > > -- John Zhuge