Yes. BaseExecutor is the way to go.

On Fri, Jan 27, 2023 at 5:30 PM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> Question though... I'm not sure we need to mark anything in the subclasses
> (K8sExecutor, LocalExecutor etc) as public... how would we choose what to
> make public?  Why would we do so?  Why not just keep it simple and say that
> BaseExecutor is the public interface and that's what's public and the
> individual executors have backcompat with regard to their external behavior
> but not with their code structure necessarily....
>
> And I'll take on the docs change.
>
> On Fri, Jan 27, 2023 at 8:21 AM Ferruzzi, Dennis
> <ferru...@amazon.com.invalid> wrote:
>
>> > - We may not want to make a strong statement like "*all executors are
>> public*". It's just
>>
>> > impossible IMHO.
>> > - Let's just mark a limited number of key methods/interfaces in each
>> executor as public,
>> > and we ensure a strong backcompat for them. For any methods/interfaces
>> not marked, no
>> > backcompat guaranteed even between minor versions. This should also be
>> added into
>> > documentation, as well as highlighted in the executor docstrings.
>>
>> If you missed something, then so did I.   I think that is an accurate
>> summary.
>>
>> ------------------------------
>> *From:* Xiaodong Deng <xdd...@apache.org>
>> *Sent:* Thursday, January 26, 2023 9:13 PM
>> *To:* dev@airflow.apache.org
>> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking
>> change for Airflow (SemVer context)
>>
>>
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>> Happy to see the change I made was used as an example here ;-)
>>
>> Honestly the discussion is a bit long and getting increasingly general,
>> so a bit hard to follow for me. But if I understand the conversation &
>> issue correctly, here are my thoughts:
>>
>> - We may not want to make a strong statement like "*all executors are
>> public*". It's just impossible IMHO.
>> - Let's just mark a limited number of key methods/interfaces in each
>> executor as public, and we ensure a strong backcompat for them. For any
>> methods/interfaces not marked, no backcompat guaranteed even between minor
>> versions. This should also be added into documentation, as well as
>> highlighted in the executor docstrings.
>>
>> I believe I must have missed some important points in the conversation,
>> or even misunderstood some points. But just summarizing what I have
>> understood as well as what I would prefer.
>>
>>
>> Regards,
>> XD
>>
>> On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish
>> <daniel.stand...@astronomer.io.invalid> wrote:
>>
>>> I understand it's "convenient" if they want to extend k8s executor.  But
>>> I guess my thought is, I'm not sure that convenience outweighs the
>>> competing good which is, freedom to develop k8s executor.  I'm not saying
>>> don't extend k8s executor -- please do, and contribute it back.  But for
>>> subclassing / extending, that feels more like "you're on your own", in the
>>> sense of, "just vendor the executor".
>>>
>>> For example in a recent PR, XD added better support for running k8s
>>> executor with multiple namespace.  Now instead of a single `watcher` on k8s
>>> executor we have a `watchers` dict
>>> <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>.
>>> I suppose it's "technically" breaking, in a subclass sense.  I don't really
>>> think that's the kind of backcompat we should worry about when making
>>> changes to k8s executor.
>>>
>>> Certainly the user-facing executor *behavior *should follow backcompat
>>> guarantees.  For example we can't just change the name of `pod_override`
>>> (in executor config) since that would break user workflows.  Similar, with
>>> a conf setting... renaming it without backcompat etc.  Where the dag writer
>>> "feels" the executor, it should respect semver, certainly.  But I'm not
>>> sure we need to guarantee backcompat of its internal methods.
>>>
>>> Another real example: Jed right now is trying to rename pod_id to
>>> pod_name, a sensible cleanup.  But k8s internals are "public", we can't
>>> really do that.  And we could imagine lots of grey areas.  Thought
>>> experiment: maybe we want to move some method call out of `sync` and run it
>>> periodically in a thread.  That would change what `sync` does and if a
>>> subclass depends on that, well I guess we've broken that subclass.
>>>
>>> WDYT?
>>>
>>>
>>> On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Yeah. The PR was mostly to bring things together- i.e to be able to
>>>> say "this is really a public interface of ours". And by just doing it,
>>>> I was actually hoping it will spark those kind of discussions where we
>>>> will really detail on what it means for each of those to be public.
>>>>
>>>> Having it written down and part of our docs makes it easy to spark
>>>> such discussion by pointing to it and asking "do we really mean that
>>>> or that?"
>>>>
>>>> I think once we make a few passes and polish it even more and get to
>>>> 2.6 release when this documentation part will be first time published,
>>>> we **could** vote on it before (or maybe lazy-consent if we won't see
>>>> many doubts raised).
>>>>
>>>> Whether it's an AIP - I don't think so, It's mostly about things that
>>>> are there that our "intentions" are as public. It's not a legally
>>>> binding document (I don't think anyone can make a legally binding
>>>> backwards compatibility statement), it's merely a statement of intent
>>>> of the community that our users can treat like something that they can
>>>> build on when extending Airflow. And understand the rules and
>>>> expectations when they look for details of each of the "components" we
>>>> publish.
>>>>
>>>> Coming back to the original question:
>>>>
>>>> I personally don't think we should limit just Base Executor interface
>>>> - because I can imagine there will be enough cases where extending
>>>> existing Kubernetes Executor will be far easier than just using the
>>>> interface and our users should understand some of the expectations
>>>> they might have when doing so.
>>>>
>>>> One of the ways we could make our intentions clear there is to
>>>> describe and follow the Python rules for what's internal and what's
>>>> public. (under or even dunder in front of the class/method name). This
>>>> is not "entirely" clear (protected vs private is somewhat blurry
>>>> here). But maybe we could agree on some rules that we will apply in
>>>> all such "public" interfaces of ours. We could do it with the executor
>>>> classes and methods that we do not "intend" to be extended and rename
>>>> them with the under or dunder in front)
>>>>
>>>> Also It can be implemented in one of two ways:
>>>>
>>>> 1) mark the methods/classes that we consider as "changeable" via
>>>> adding dunder in front and describe it to the users so that they can
>>>> understand what the rules are and make the right decisions
>>>> 2) make it a bit more formal, automate and easier to use by the users
>>>> in automated way - just follow what we have done with "common.sql"
>>>> where we implemented a process to generate and control generated .pyi
>>>> stubs:
>>>>
>>>> Pre-commit that takes care of it:
>>>>
>>>> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py
>>>> Example generated stub file:
>>>>
>>>> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi
>>>>
>>>> Recently, we've gone through some first "teething" problems with the
>>>> "stubgen" approach and while it is not perfect, we managed to bend it
>>>> a bit to our expectations even if the original stubgen generator had
>>>> some flaws. And I think we would be able to generalise the common.sql
>>>> approach to all kinds of public interfaces we have.
>>>>
>>>> One problem with the stub approach is that it's hard to distinguish
>>>> "internal" vs. "external" use of some of the interfaces, However in
>>>> our case, rather than generate the .pyi files and storing it in the
>>>> code of airflow, we could simply generate and publish the "typeshed"
>>>> interface for Apache Airflow which will ONLY contain the "externally
>>>> public" interfaces (https://github.com/python/typeshed) . What
>>>> typeshed is (from the typeshed github):
>>>>
>>>> ""
>>>> Typeshed contains external type annotations for the Python standard
>>>> library and Python builtins, as well as third party packages as
>>>> contributed by people external to those projects.
>>>> This data can e.g. be used for static analysis, type checking or type
>>>> inference.
>>>> """
>>>>
>>>> This way a user who wishes to extend airflow could simply install the
>>>> `types-apache-airflow==2.6.0`  and this will give a very strong base
>>>> on understanding what is / is not public  - usable in automated way
>>>>
>>>> I think it would not be complex to generate such a typeshed package if
>>>> we go that direction and it would help us also to protect from
>>>> accidental changes (same as we do in common.sql - by automating the
>>>> part where we check if the generated interface has not changed in
>>>> backwards-incompatible way).
>>>>
>>>> J.
>>>>
>>>> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish
>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>> >
>>>> > Following up here... that PR has been merged.... At some point we
>>>> should probably have a vote on that, if it's meant to be actual binding
>>>> policy.  Maybe we're still feeling it out?  But "what is public" is a
>>>> pretty fundamental concern to the project.  Maybe such a policy itself
>>>> should be an AIP?
>>>> >
>>>> > Meanwhile, going down into the weeds a bit, there's one aspect of the
>>>> doc which came up here:
>>>> https://github.com/apache/airflow/pull/29147#discussion_r1086214750
>>>> >
>>>> > It quotes the "policy":
>>>> >
>>>> > > Airflow has a set of Executors that are considered public. You are
>>>> free to extend their functionality
>>>> >
>>>> > Do we really mean that all executors are public?  Or do we just mean
>>>> that the executor interface is public, i.e. BaseExecutor?
>>>> >
>>>> > It's of course better for the healthy development of our built-in
>>>> executors if we have no backcompat guarantees and can change them as
>>>> needed.  But yes this means that anyone else "building on" our executors
>>>> would be wise to "vendor in" our executor code instead of just subclassing.
>>>> Maybe it's a reasonable assumption that any user with the sophistication
>>>> and, perhaps, chutzpah to customize one of our executors, has also the
>>>> sophistication to manage this.
>>>> >
>>>> > What do y'all think?
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>

Reply via email to