Re: [DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Xiaodong Deng Thu, 26 Jan 2023 21:14:13 -0800

Happy to see the change I made was used as an example here ;-)

Honestly the discussion is a bit long and getting increasingly general, so
a bit hard to follow for me. But if I understand the conversation & issue
correctly, here are my thoughts:


- We may not want to make a strong statement like "*all executors are
public*". It's just impossible IMHO.
- Let's just mark a limited number of key methods/interfaces in each
executor as public, and we ensure a strong backcompat for them. For any
methods/interfaces not marked, no backcompat guaranteed even between minor
versions. This should also be added into documentation, as well as
highlighted in the executor docstrings.

I believe I must have missed some important points in the conversation, or
even misunderstood some points. But just summarizing what I have understood
as well as what I would prefer.


Regards,
XD

On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> I understand it's "convenient" if they want to extend k8s executor.  But I
> guess my thought is, I'm not sure that convenience outweighs the competing
> good which is, freedom to develop k8s executor.  I'm not saying don't
> extend k8s executor -- please do, and contribute it back.  But for
> subclassing / extending, that feels more like "you're on your own", in the
> sense of, "just vendor the executor".
>
> For example in a recent PR, XD added better support for running k8s
> executor with multiple namespace.  Now instead of a single `watcher` on k8s
> executor we have a `watchers` dict
> <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>.
> I suppose it's "technically" breaking, in a subclass sense.  I don't really
> think that's the kind of backcompat we should worry about when making
> changes to k8s executor.
>
> Certainly the user-facing executor *behavior *should follow backcompat
> guarantees.  For example we can't just change the name of `pod_override`
> (in executor config) since that would break user workflows.  Similar, with
> a conf setting... renaming it without backcompat etc.  Where the dag writer
> "feels" the executor, it should respect semver, certainly.  But I'm not
> sure we need to guarantee backcompat of its internal methods.
>
> Another real example: Jed right now is trying to rename pod_id to
> pod_name, a sensible cleanup.  But k8s internals are "public", we can't
> really do that.  And we could imagine lots of grey areas.  Thought
> experiment: maybe we want to move some method call out of `sync` and run it
> periodically in a thread.  That would change what `sync` does and if a
> subclass depends on that, well I guess we've broken that subclass.
>
> WDYT?
>
>
> On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Yeah. The PR was mostly to bring things together- i.e to be able to
>> say "this is really a public interface of ours". And by just doing it,
>> I was actually hoping it will spark those kind of discussions where we
>> will really detail on what it means for each of those to be public.
>>
>> Having it written down and part of our docs makes it easy to spark
>> such discussion by pointing to it and asking "do we really mean that
>> or that?"
>>
>> I think once we make a few passes and polish it even more and get to
>> 2.6 release when this documentation part will be first time published,
>> we **could** vote on it before (or maybe lazy-consent if we won't see
>> many doubts raised).
>>
>> Whether it's an AIP - I don't think so, It's mostly about things that
>> are there that our "intentions" are as public. It's not a legally
>> binding document (I don't think anyone can make a legally binding
>> backwards compatibility statement), it's merely a statement of intent
>> of the community that our users can treat like something that they can
>> build on when extending Airflow. And understand the rules and
>> expectations when they look for details of each of the "components" we
>> publish.
>>
>> Coming back to the original question:
>>
>> I personally don't think we should limit just Base Executor interface
>> - because I can imagine there will be enough cases where extending
>> existing Kubernetes Executor will be far easier than just using the
>> interface and our users should understand some of the expectations
>> they might have when doing so.
>>
>> One of the ways we could make our intentions clear there is to
>> describe and follow the Python rules for what's internal and what's
>> public. (under or even dunder in front of the class/method name). This
>> is not "entirely" clear (protected vs private is somewhat blurry
>> here). But maybe we could agree on some rules that we will apply in
>> all such "public" interfaces of ours. We could do it with the executor
>> classes and methods that we do not "intend" to be extended and rename
>> them with the under or dunder in front)
>>
>> Also It can be implemented in one of two ways:
>>
>> 1) mark the methods/classes that we consider as "changeable" via
>> adding dunder in front and describe it to the users so that they can
>> understand what the rules are and make the right decisions
>> 2) make it a bit more formal, automate and easier to use by the users
>> in automated way - just follow what we have done with "common.sql"
>> where we implemented a process to generate and control generated .pyi
>> stubs:
>>
>> Pre-commit that takes care of it:
>>
>> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py
>> Example generated stub file:
>>
>> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi
>>
>> Recently, we've gone through some first "teething" problems with the
>> "stubgen" approach and while it is not perfect, we managed to bend it
>> a bit to our expectations even if the original stubgen generator had
>> some flaws. And I think we would be able to generalise the common.sql
>> approach to all kinds of public interfaces we have.
>>
>> One problem with the stub approach is that it's hard to distinguish
>> "internal" vs. "external" use of some of the interfaces, However in
>> our case, rather than generate the .pyi files and storing it in the
>> code of airflow, we could simply generate and publish the "typeshed"
>> interface for Apache Airflow which will ONLY contain the "externally
>> public" interfaces (https://github.com/python/typeshed) . What
>> typeshed is (from the typeshed github):
>>
>> ""
>> Typeshed contains external type annotations for the Python standard
>> library and Python builtins, as well as third party packages as
>> contributed by people external to those projects.
>> This data can e.g. be used for static analysis, type checking or type
>> inference.
>> """
>>
>> This way a user who wishes to extend airflow could simply install the
>> `types-apache-airflow==2.6.0`  and this will give a very strong base
>> on understanding what is / is not public  - usable in automated way
>>
>> I think it would not be complex to generate such a typeshed package if
>> we go that direction and it would help us also to protect from
>> accidental changes (same as we do in common.sql - by automating the
>> part where we check if the generated interface has not changed in
>> backwards-incompatible way).
>>
>> J.
>>
>> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish
>> <daniel.stand...@astronomer.io.invalid> wrote:
>> >
>> > Following up here... that PR has been merged.... At some point we
>> should probably have a vote on that, if it's meant to be actual binding
>> policy.  Maybe we're still feeling it out?  But "what is public" is a
>> pretty fundamental concern to the project.  Maybe such a policy itself
>> should be an AIP?
>> >
>> > Meanwhile, going down into the weeds a bit, there's one aspect of the
>> doc which came up here:
>> https://github.com/apache/airflow/pull/29147#discussion_r1086214750
>> >
>> > It quotes the "policy":
>> >
>> > > Airflow has a set of Executors that are considered public. You are
>> free to extend their functionality
>> >
>> > Do we really mean that all executors are public?  Or do we just mean
>> that the executor interface is public, i.e. BaseExecutor?
>> >
>> > It's of course better for the healthy development of our built-in
>> executors if we have no backcompat guarantees and can change them as
>> needed.  But yes this means that anyone else "building on" our executors
>> would be wise to "vendor in" our executor code instead of just subclassing.
>> Maybe it's a reasonable assumption that any user with the sophistication
>> and, perhaps, chutzpah to customize one of our executors, has also the
>> sophistication to manage this.
>> >
>> > What do y'all think?
>> >
>> >
>> >
>> >
>> >
>>
>

Re: [DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Reply via email to