Re: [VOTE] Amend Spark's Semantic Versioning Policy

Takuya UESHIN Mon, 09 Mar 2020 12:16:52 -0700

+1 (binding)


On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com> wrote:

> +1 (non-binding)
>
> Cheers,
>
> Xingbo
>
> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com> wrote:
>
>> +1 (binding)
>>
>> Xiao
>>
>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>>
>>>> The proposal itself seems good as the factors to consider, Thanks
>>>> Michael.
>>>>
>>>> Several concerns mentioned look good points, in particular:
>>>>
>>>> > ... assuming that this is for public stable APIs, not APIs that are
>>>> marked as unstable, evolving, etc. ...
>>>> I would like to confirm this. We already have API annotations such as
>>>> Experimental, Unstable, etc. and the implication of each is still
>>>> effective. If it's for stable APIs, it makes sense to me as well.
>>>>
>>>> > ... can we expand on 'when' an API change can occur ?  Since we are
>>>> proposing to diverge from semver. ...
>>>> I think this is a good point. If we're proposing to divert from semver,
>>>> the delta compared to semver will have to be clarified to avoid different
>>>> personal interpretations of the somewhat general principles.
>>>>
>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>>>> Apache Spark 3.0+? ...
>>>>
>>>> Assuming these concerns will be addressed, +1 (binding).
>>>>
>>>>
>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 작성:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Bests,
>>>>> Takeshi
>>>>>
>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>>>>> gengliang.w...@databricks.com> wrote:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> Gengliang
>>>>>>
>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>>>>>> matei.zaha...@gmail.com> wrote:
>>>>>>
>>>>>>> +1 as well.
>>>>>>>
>>>>>>> Matei
>>>>>>>
>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> +1 (binding), assuming that this is for public stable APIs, not APIs
>>>>>>> that are marked as unstable, evolving, etc.
>>>>>>>
>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> Michael's section on the trade-offs of maintaining / removing an
>>>>>>>> API are one of
>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>>>>>>
>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
>>>>>>>> dongjoon.h...@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > This new policy has a good indention, but can we narrow down on
>>>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>>>>>>> >
>>>>>>>> > I saw that there already exists a reverting PR to bring back
>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>>>>>>> >
>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>>>>>>> difficulty, and it's nice.
>>>>>>>> >
>>>>>>>> > However, for the other cases, it sounds like `recommending older
>>>>>>>> APIs as much as possible` due to the following.
>>>>>>>> >
>>>>>>>> >      > How long has the API been in Spark?
>>>>>>>> >
>>>>>>>> > We had better be more careful when we add a new policy and should
>>>>>>>> aim not to mislead the users and 3rd party library developers to say 
>>>>>>>> "older
>>>>>>>> is better".
>>>>>>>> >
>>>>>>>> > Technically, I'm wondering who will use new APIs in their
>>>>>>>> examples (of books and StackOverflow) if they need to write an 
>>>>>>>> additional
>>>>>>>> warning like `this only works at 2.4.0+` always .
>>>>>>>> >
>>>>>>>> > Bests,
>>>>>>>> > Dongjoon.
>>>>>>>> >
>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>>>>>>>> mri...@gmail.com> wrote:
>>>>>>>> >>
>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I
>>>>>>>> prefer
>>>>>>>> >> stable well designed API's :-)
>>>>>>>> >>
>>>>>>>> >> Can we tie the proposal to stability guarantees given by spark
>>>>>>>> and
>>>>>>>> >> reasonable expectation from users ?
>>>>>>>> >> In my opinion, an unstable or evolving could change - while an
>>>>>>>> >> experimental api which has been around for ages should be more
>>>>>>>> >> conservatively handled.
>>>>>>>> >> Which brings in question what are the stability guarantees as
>>>>>>>> >> specified by annotations interacting with the proposal.
>>>>>>>> >>
>>>>>>>> >> Also, can we expand on 'when' an API change can occur ?  Since
>>>>>>>> we are
>>>>>>>> >> proposing to diverge from semver.
>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on
>>>>>>>> 'impact'
>>>>>>>> >> of API ? Stability guarantees ?
>>>>>>>> >>
>>>>>>>> >> Regards,
>>>>>>>> >> Mridul
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>>>>>>> mich...@databricks.com> wrote:
>>>>>>>> >> >
>>>>>>>> >> > I'll start off the vote with a strong +1 (binding).
>>>>>>>> >> >
>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>>>>>>> mich...@databricks.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> I propose to add the following text to Spark's Semantic
>>>>>>>> Versioning policy and adopt it as the rubric that should be used when
>>>>>>>> deciding to break APIs (even at major versions such as 3.0).
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
>>>>>>>> this is a procedural vote, the measure will pass if there are more
>>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but the
>>>>>>>> community is encouraged to add their voice to the discussion.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy.
>>>>>>>> >> >>
>>>>>>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> <new policy>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Considerations When Breaking APIs
>>>>>>>> >> >>
>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or silently
>>>>>>>> changing behavior, even at major versions. While this is not always
>>>>>>>> possible, the balance of the following factors should be considered 
>>>>>>>> before
>>>>>>>> choosing to break an API.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Cost of Breaking an API
>>>>>>>> >> >>
>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the
>>>>>>>> users of Spark. A broken API means that Spark programs need to be 
>>>>>>>> rewritten
>>>>>>>> before they can be upgraded. However, there are a few considerations 
>>>>>>>> when
>>>>>>>> thinking about what the cost will be:
>>>>>>>> >> >>
>>>>>>>> >> >> Usage - an API that is actively used in many different
>>>>>>>> places, is always very costly to break. While it is hard to know usage 
>>>>>>>> for
>>>>>>>> sure, there are a bunch of ways that we can estimate:
>>>>>>>> >> >>
>>>>>>>> >> >> How long has the API been in Spark?
>>>>>>>> >> >>
>>>>>>>> >> >> Is the API common even for basic programs?
>>>>>>>> >> >>
>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing lists?
>>>>>>>> >> >>
>>>>>>>> >> >> How often does it appear in StackOverflow or blogs?
>>>>>>>> >> >>
>>>>>>>> >> >> Behavior after the break - How will a program that works
>>>>>>>> today, work after the break? The following are listed roughly in order 
>>>>>>>> of
>>>>>>>> increasing severity:
>>>>>>>> >> >>
>>>>>>>> >> >> Will there be a compiler or linker error?
>>>>>>>> >> >>
>>>>>>>> >> >> Will there be a runtime exception?
>>>>>>>> >> >>
>>>>>>>> >> >> Will that exception happen after significant processing has
>>>>>>>> been done?
>>>>>>>> >> >>
>>>>>>>> >> >> Will we silently return different answers? (very hard to
>>>>>>>> debug, might not even notice!)
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Cost of Maintaining an API
>>>>>>>> >> >>
>>>>>>>> >> >> Of course, the above does not mean that we will never break
>>>>>>>> any APIs. We must also consider the cost both to the project and to our
>>>>>>>> users of keeping the API in question.
>>>>>>>> >> >>
>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and
>>>>>>>> needs to keep working as other parts of the project changes. These 
>>>>>>>> costs
>>>>>>>> are significantly exacerbated when external dependencies change (the 
>>>>>>>> JVM,
>>>>>>>> Scala, etc). In some cases, while not completely technically 
>>>>>>>> infeasible,
>>>>>>>> the cost of maintaining a particular API can become too high.
>>>>>>>> >> >>
>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users
>>>>>>>> learning Spark or trying to understand Spark programs. This cost 
>>>>>>>> becomes
>>>>>>>> even higher when the API in question has confusing or undefined 
>>>>>>>> semantics.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Alternatives to Breaking an API
>>>>>>>> >> >>
>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of
>>>>>>>> removal is also high, there are alternatives that should be considered 
>>>>>>>> that
>>>>>>>> do not hurt existing users but do address some of the maintenance 
>>>>>>>> costs.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an
>>>>>>>> important point. Anytime we are adding a new interface to Spark we 
>>>>>>>> should
>>>>>>>> consider that we might be stuck with this API forever. Think deeply 
>>>>>>>> about
>>>>>>>> how new APIs relate to existing ones, as well as how you expect them to
>>>>>>>> evolve over time.
>>>>>>>> >> >>
>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should point
>>>>>>>> to a clear alternative and should never just say that an API is 
>>>>>>>> deprecated.
>>>>>>>> >> >>
>>>>>>>> >> >> Updated Docs - Documentation should point to the "best"
>>>>>>>> recommended way of performing a given task. In the cases where we 
>>>>>>>> maintain
>>>>>>>> legacy documentation, we should clearly point to newer APIs and 
>>>>>>>> suggest to
>>>>>>>> users the "right" way.
>>>>>>>> >> >>
>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs and
>>>>>>>> other sites such as StackOverflow. However, many of these resources 
>>>>>>>> are out
>>>>>>>> of date. Update them, to reduce the cost of eventually removing 
>>>>>>>> deprecated
>>>>>>>> APIs.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> </new policy>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>> >>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>
>>
>> --
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

-- 
Takuya UESHIN

http://twitter.com/ueshin

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to