Re: [VOTE] Amend Spark's Semantic Versioning Policy

Sean Owen Fri, 06 Mar 2020 19:10:13 -0800

+1

On Fri, Mar 6, 2020 at 8:59 PM Michael Armbrust <mich...@databricks.com> wrote:
>
> I propose to add the following text to Spark's Semantic Versioning policy and 
> adopt it as the rubric that should be used when deciding to break APIs (even 
> at major versions such as 3.0).
>
>
> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a 
> procedural vote, the measure will pass if there are more favourable votes 
> than unfavourable ones. PMC votes are binding, but the community is 
> encouraged to add their voice to the discussion.
>
>
> [ ] +1 - Spark should adopt this policy.
>
> [ ] -1  - Spark should not adopt this policy.
>
>
> <new policy>
>
>
> Considerations When Breaking APIs
>
> The Spark project strives to avoid breaking APIs or silently changing 
> behavior, even at major versions. While this is not always possible, the 
> balance of the following factors should be considered before choosing to 
> break an API.
>
>
> Cost of Breaking an API
>
> Breaking an API almost always has a non-trivial cost to the users of Spark. A 
> broken API means that Spark programs need to be rewritten before they can be 
> upgraded. However, there are a few considerations when thinking about what 
> the cost will be:
>
> Usage - an API that is actively used in many different places, is always very 
> costly to break. While it is hard to know usage for sure, there are a bunch 
> of ways that we can estimate:
>
> How long has the API been in Spark?
>
> Is the API common even for basic programs?
>
> How often do we see recent questions in JIRA or mailing lists?
>
> How often does it appear in StackOverflow or blogs?
>
> Behavior after the break - How will a program that works today, work after 
> the break? The following are listed roughly in order of increasing severity:
>
> Will there be a compiler or linker error?
>
> Will there be a runtime exception?
>
> Will that exception happen after significant processing has been done?
>
> Will we silently return different answers? (very hard to debug, might not 
> even notice!)
>
>
> Cost of Maintaining an API
>
> Of course, the above does not mean that we will never break any APIs. We must 
> also consider the cost both to the project and to our users of keeping the 
> API in question.
>
> Project Costs - Every API we have needs to be tested and needs to keep 
> working as other parts of the project changes. These costs are significantly 
> exacerbated when external dependencies change (the JVM, Scala, etc). In some 
> cases, while not completely technically infeasible, the cost of maintaining a 
> particular API can become too high.
>
> User Costs - APIs also have a cognitive cost to users learning Spark or 
> trying to understand Spark programs. This cost becomes even higher when the 
> API in question has confusing or undefined semantics.
>
>
> Alternatives to Breaking an API
>
> In cases where there is a "Bad API", but where the cost of removal is also 
> high, there are alternatives that should be considered that do not hurt 
> existing users but do address some of the maintenance costs.
>
>
> Avoid Bad APIs - While this is a bit obvious, it is an important point. 
> Anytime we are adding a new interface to Spark we should consider that we 
> might be stuck with this API forever. Think deeply about how new APIs relate 
> to existing ones, as well as how you expect them to evolve over time.
>
> Deprecation Warnings - All deprecation warnings should point to a clear 
> alternative and should never just say that an API is deprecated.
>
> Updated Docs - Documentation should point to the "best" recommended way of 
> performing a given task. In the cases where we maintain legacy documentation, 
> we should clearly point to newer APIs and suggest to users the "right" way.
>
> Community Work - Many people learn Spark by reading blogs and other sites 
> such as StackOverflow. However, many of these resources are out of date. 
> Update them, to reduce the cost of eventually removing deprecated APIs.
>
>
> </new policy>


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to