+1 (binding)
On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com> wrote: > +1 (non-binding) > > Cheers, > > Xingbo > > On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com> wrote: > >> +1 (binding) >> >> Xiao >> >> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com> wrote: >> >>> +1 (non-binding) >>> >>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> >>>> The proposal itself seems good as the factors to consider, Thanks >>>> Michael. >>>> >>>> Several concerns mentioned look good points, in particular: >>>> >>>> > ... assuming that this is for public stable APIs, not APIs that are >>>> marked as unstable, evolving, etc. ... >>>> I would like to confirm this. We already have API annotations such as >>>> Experimental, Unstable, etc. and the implication of each is still >>>> effective. If it's for stable APIs, it makes sense to me as well. >>>> >>>> > ... can we expand on 'when' an API change can occur ? Since we are >>>> proposing to diverge from semver. ... >>>> I think this is a good point. If we're proposing to divert from semver, >>>> the delta compared to semver will have to be clarified to avoid different >>>> personal interpretations of the somewhat general principles. >>>> >>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >>>> Apache Spark 3.0+? ... >>>> >>>> Assuming these concerns will be addressed, +1 (binding). >>>> >>>> >>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 작성: >>>> >>>>> +1 (non-binding) >>>>> >>>>> Bests, >>>>> Takeshi >>>>> >>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>>>> gengliang.w...@databricks.com> wrote: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> Gengliang >>>>>> >>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia < >>>>>> matei.zaha...@gmail.com> wrote: >>>>>> >>>>>>> +1 as well. >>>>>>> >>>>>>> Matei >>>>>>> >>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> +1 (binding), assuming that this is for public stable APIs, not APIs >>>>>>> that are marked as unstable, evolving, etc. >>>>>>> >>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> Michael's section on the trade-offs of maintaining / removing an >>>>>>>> API are one of >>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>>>> >>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < >>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>> > >>>>>>>> > This new policy has a good indention, but can we narrow down on >>>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>>>> > >>>>>>>> > I saw that there already exists a reverting PR to bring back >>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>>>>> > >>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>>>> difficulty, and it's nice. >>>>>>>> > >>>>>>>> > However, for the other cases, it sounds like `recommending older >>>>>>>> APIs as much as possible` due to the following. >>>>>>>> > >>>>>>>> > > How long has the API been in Spark? >>>>>>>> > >>>>>>>> > We had better be more careful when we add a new policy and should >>>>>>>> aim not to mislead the users and 3rd party library developers to say >>>>>>>> "older >>>>>>>> is better". >>>>>>>> > >>>>>>>> > Technically, I'm wondering who will use new APIs in their >>>>>>>> examples (of books and StackOverflow) if they need to write an >>>>>>>> additional >>>>>>>> warning like `this only works at 2.4.0+` always . >>>>>>>> > >>>>>>>> > Bests, >>>>>>>> > Dongjoon. >>>>>>>> > >>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < >>>>>>>> mri...@gmail.com> wrote: >>>>>>>> >> >>>>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>>>> prefer >>>>>>>> >> stable well designed API's :-) >>>>>>>> >> >>>>>>>> >> Can we tie the proposal to stability guarantees given by spark >>>>>>>> and >>>>>>>> >> reasonable expectation from users ? >>>>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>>>> >> experimental api which has been around for ages should be more >>>>>>>> >> conservatively handled. >>>>>>>> >> Which brings in question what are the stability guarantees as >>>>>>>> >> specified by annotations interacting with the proposal. >>>>>>>> >> >>>>>>>> >> Also, can we expand on 'when' an API change can occur ? Since >>>>>>>> we are >>>>>>>> >> proposing to diverge from semver. >>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>>>> 'impact' >>>>>>>> >> of API ? Stability guarantees ? >>>>>>>> >> >>>>>>>> >> Regards, >>>>>>>> >> Mridul >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>>>>> mich...@databricks.com> wrote: >>>>>>>> >> > >>>>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>>>> >> > >>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>>>>> mich...@databricks.com> wrote: >>>>>>>> >> >> >>>>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>>>> Versioning policy and adopt it as the rubric that should be used when >>>>>>>> deciding to break APIs (even at major versions such as 3.0). >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As >>>>>>>> this is a procedural vote, the measure will pass if there are more >>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but the >>>>>>>> community is encouraged to add their voice to the discussion. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>>>> >> >> >>>>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> <new policy> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Considerations When Breaking APIs >>>>>>>> >> >> >>>>>>>> >> >> The Spark project strives to avoid breaking APIs or silently >>>>>>>> changing behavior, even at major versions. While this is not always >>>>>>>> possible, the balance of the following factors should be considered >>>>>>>> before >>>>>>>> choosing to break an API. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Cost of Breaking an API >>>>>>>> >> >> >>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>>>>> users of Spark. A broken API means that Spark programs need to be >>>>>>>> rewritten >>>>>>>> before they can be upgraded. However, there are a few considerations >>>>>>>> when >>>>>>>> thinking about what the cost will be: >>>>>>>> >> >> >>>>>>>> >> >> Usage - an API that is actively used in many different >>>>>>>> places, is always very costly to break. While it is hard to know usage >>>>>>>> for >>>>>>>> sure, there are a bunch of ways that we can estimate: >>>>>>>> >> >> >>>>>>>> >> >> How long has the API been in Spark? >>>>>>>> >> >> >>>>>>>> >> >> Is the API common even for basic programs? >>>>>>>> >> >> >>>>>>>> >> >> How often do we see recent questions in JIRA or mailing lists? >>>>>>>> >> >> >>>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>>> >> >> >>>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>>> today, work after the break? The following are listed roughly in order >>>>>>>> of >>>>>>>> increasing severity: >>>>>>>> >> >> >>>>>>>> >> >> Will there be a compiler or linker error? >>>>>>>> >> >> >>>>>>>> >> >> Will there be a runtime exception? >>>>>>>> >> >> >>>>>>>> >> >> Will that exception happen after significant processing has >>>>>>>> been done? >>>>>>>> >> >> >>>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>>> debug, might not even notice!) >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Cost of Maintaining an API >>>>>>>> >> >> >>>>>>>> >> >> Of course, the above does not mean that we will never break >>>>>>>> any APIs. We must also consider the cost both to the project and to our >>>>>>>> users of keeping the API in question. >>>>>>>> >> >> >>>>>>>> >> >> Project Costs - Every API we have needs to be tested and >>>>>>>> needs to keep working as other parts of the project changes. These >>>>>>>> costs >>>>>>>> are significantly exacerbated when external dependencies change (the >>>>>>>> JVM, >>>>>>>> Scala, etc). In some cases, while not completely technically >>>>>>>> infeasible, >>>>>>>> the cost of maintaining a particular API can become too high. >>>>>>>> >> >> >>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users >>>>>>>> learning Spark or trying to understand Spark programs. This cost >>>>>>>> becomes >>>>>>>> even higher when the API in question has confusing or undefined >>>>>>>> semantics. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Alternatives to Breaking an API >>>>>>>> >> >> >>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>>> removal is also high, there are alternatives that should be considered >>>>>>>> that >>>>>>>> do not hurt existing users but do address some of the maintenance >>>>>>>> costs. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>>> important point. Anytime we are adding a new interface to Spark we >>>>>>>> should >>>>>>>> consider that we might be stuck with this API forever. Think deeply >>>>>>>> about >>>>>>>> how new APIs relate to existing ones, as well as how you expect them to >>>>>>>> evolve over time. >>>>>>>> >> >> >>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should point >>>>>>>> to a clear alternative and should never just say that an API is >>>>>>>> deprecated. >>>>>>>> >> >> >>>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>>> recommended way of performing a given task. In the cases where we >>>>>>>> maintain >>>>>>>> legacy documentation, we should clearly point to newer APIs and >>>>>>>> suggest to >>>>>>>> users the "right" way. >>>>>>>> >> >> >>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs and >>>>>>>> other sites such as StackOverflow. However, many of these resources >>>>>>>> are out >>>>>>>> of date. Update them, to reduce the cost of eventually removing >>>>>>>> deprecated >>>>>>>> APIs. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> </new policy> >>>>>>>> >> >>>>>>>> >> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> --- >>>>> Takeshi Yamamuro >>>>> >>>> >> >> -- >> <https://databricks.com/sparkaisummit/north-america> >> > -- Takuya UESHIN http://twitter.com/ueshin