Yes. BaseExecutor is the way to go. On Fri, Jan 27, 2023 at 5:30 PM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote:
> Question though... I'm not sure we need to mark anything in the subclasses > (K8sExecutor, LocalExecutor etc) as public... how would we choose what to > make public? Why would we do so? Why not just keep it simple and say that > BaseExecutor is the public interface and that's what's public and the > individual executors have backcompat with regard to their external behavior > but not with their code structure necessarily.... > > And I'll take on the docs change. > > On Fri, Jan 27, 2023 at 8:21 AM Ferruzzi, Dennis > <ferru...@amazon.com.invalid> wrote: > >> > - We may not want to make a strong statement like "*all executors are >> public*". It's just >> >> > impossible IMHO. >> > - Let's just mark a limited number of key methods/interfaces in each >> executor as public, >> > and we ensure a strong backcompat for them. For any methods/interfaces >> not marked, no >> > backcompat guaranteed even between minor versions. This should also be >> added into >> > documentation, as well as highlighted in the executor docstrings. >> >> If you missed something, then so did I. I think that is an accurate >> summary. >> >> ------------------------------ >> *From:* Xiaodong Deng <xdd...@apache.org> >> *Sent:* Thursday, January 26, 2023 9:13 PM >> *To:* dev@airflow.apache.org >> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking >> change for Airflow (SemVer context) >> >> >> *CAUTION*: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender and >> know the content is safe. >> >> Happy to see the change I made was used as an example here ;-) >> >> Honestly the discussion is a bit long and getting increasingly general, >> so a bit hard to follow for me. But if I understand the conversation & >> issue correctly, here are my thoughts: >> >> - We may not want to make a strong statement like "*all executors are >> public*". It's just impossible IMHO. >> - Let's just mark a limited number of key methods/interfaces in each >> executor as public, and we ensure a strong backcompat for them. For any >> methods/interfaces not marked, no backcompat guaranteed even between minor >> versions. This should also be added into documentation, as well as >> highlighted in the executor docstrings. >> >> I believe I must have missed some important points in the conversation, >> or even misunderstood some points. But just summarizing what I have >> understood as well as what I would prefer. >> >> >> Regards, >> XD >> >> On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish >> <daniel.stand...@astronomer.io.invalid> wrote: >> >>> I understand it's "convenient" if they want to extend k8s executor. But >>> I guess my thought is, I'm not sure that convenience outweighs the >>> competing good which is, freedom to develop k8s executor. I'm not saying >>> don't extend k8s executor -- please do, and contribute it back. But for >>> subclassing / extending, that feels more like "you're on your own", in the >>> sense of, "just vendor the executor". >>> >>> For example in a recent PR, XD added better support for running k8s >>> executor with multiple namespace. Now instead of a single `watcher` on k8s >>> executor we have a `watchers` dict >>> <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>. >>> I suppose it's "technically" breaking, in a subclass sense. I don't really >>> think that's the kind of backcompat we should worry about when making >>> changes to k8s executor. >>> >>> Certainly the user-facing executor *behavior *should follow backcompat >>> guarantees. For example we can't just change the name of `pod_override` >>> (in executor config) since that would break user workflows. Similar, with >>> a conf setting... renaming it without backcompat etc. Where the dag writer >>> "feels" the executor, it should respect semver, certainly. But I'm not >>> sure we need to guarantee backcompat of its internal methods. >>> >>> Another real example: Jed right now is trying to rename pod_id to >>> pod_name, a sensible cleanup. But k8s internals are "public", we can't >>> really do that. And we could imagine lots of grey areas. Thought >>> experiment: maybe we want to move some method call out of `sync` and run it >>> periodically in a thread. That would change what `sync` does and if a >>> subclass depends on that, well I guess we've broken that subclass. >>> >>> WDYT? >>> >>> >>> On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <ja...@potiuk.com> wrote: >>> >>>> Yeah. The PR was mostly to bring things together- i.e to be able to >>>> say "this is really a public interface of ours". And by just doing it, >>>> I was actually hoping it will spark those kind of discussions where we >>>> will really detail on what it means for each of those to be public. >>>> >>>> Having it written down and part of our docs makes it easy to spark >>>> such discussion by pointing to it and asking "do we really mean that >>>> or that?" >>>> >>>> I think once we make a few passes and polish it even more and get to >>>> 2.6 release when this documentation part will be first time published, >>>> we **could** vote on it before (or maybe lazy-consent if we won't see >>>> many doubts raised). >>>> >>>> Whether it's an AIP - I don't think so, It's mostly about things that >>>> are there that our "intentions" are as public. It's not a legally >>>> binding document (I don't think anyone can make a legally binding >>>> backwards compatibility statement), it's merely a statement of intent >>>> of the community that our users can treat like something that they can >>>> build on when extending Airflow. And understand the rules and >>>> expectations when they look for details of each of the "components" we >>>> publish. >>>> >>>> Coming back to the original question: >>>> >>>> I personally don't think we should limit just Base Executor interface >>>> - because I can imagine there will be enough cases where extending >>>> existing Kubernetes Executor will be far easier than just using the >>>> interface and our users should understand some of the expectations >>>> they might have when doing so. >>>> >>>> One of the ways we could make our intentions clear there is to >>>> describe and follow the Python rules for what's internal and what's >>>> public. (under or even dunder in front of the class/method name). This >>>> is not "entirely" clear (protected vs private is somewhat blurry >>>> here). But maybe we could agree on some rules that we will apply in >>>> all such "public" interfaces of ours. We could do it with the executor >>>> classes and methods that we do not "intend" to be extended and rename >>>> them with the under or dunder in front) >>>> >>>> Also It can be implemented in one of two ways: >>>> >>>> 1) mark the methods/classes that we consider as "changeable" via >>>> adding dunder in front and describe it to the users so that they can >>>> understand what the rules are and make the right decisions >>>> 2) make it a bit more formal, automate and easier to use by the users >>>> in automated way - just follow what we have done with "common.sql" >>>> where we implemented a process to generate and control generated .pyi >>>> stubs: >>>> >>>> Pre-commit that takes care of it: >>>> >>>> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py >>>> Example generated stub file: >>>> >>>> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi >>>> >>>> Recently, we've gone through some first "teething" problems with the >>>> "stubgen" approach and while it is not perfect, we managed to bend it >>>> a bit to our expectations even if the original stubgen generator had >>>> some flaws. And I think we would be able to generalise the common.sql >>>> approach to all kinds of public interfaces we have. >>>> >>>> One problem with the stub approach is that it's hard to distinguish >>>> "internal" vs. "external" use of some of the interfaces, However in >>>> our case, rather than generate the .pyi files and storing it in the >>>> code of airflow, we could simply generate and publish the "typeshed" >>>> interface for Apache Airflow which will ONLY contain the "externally >>>> public" interfaces (https://github.com/python/typeshed) . What >>>> typeshed is (from the typeshed github): >>>> >>>> "" >>>> Typeshed contains external type annotations for the Python standard >>>> library and Python builtins, as well as third party packages as >>>> contributed by people external to those projects. >>>> This data can e.g. be used for static analysis, type checking or type >>>> inference. >>>> """ >>>> >>>> This way a user who wishes to extend airflow could simply install the >>>> `types-apache-airflow==2.6.0` and this will give a very strong base >>>> on understanding what is / is not public - usable in automated way >>>> >>>> I think it would not be complex to generate such a typeshed package if >>>> we go that direction and it would help us also to protect from >>>> accidental changes (same as we do in common.sql - by automating the >>>> part where we check if the generated interface has not changed in >>>> backwards-incompatible way). >>>> >>>> J. >>>> >>>> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish >>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>> > >>>> > Following up here... that PR has been merged.... At some point we >>>> should probably have a vote on that, if it's meant to be actual binding >>>> policy. Maybe we're still feeling it out? But "what is public" is a >>>> pretty fundamental concern to the project. Maybe such a policy itself >>>> should be an AIP? >>>> > >>>> > Meanwhile, going down into the weeds a bit, there's one aspect of the >>>> doc which came up here: >>>> https://github.com/apache/airflow/pull/29147#discussion_r1086214750 >>>> > >>>> > It quotes the "policy": >>>> > >>>> > > Airflow has a set of Executors that are considered public. You are >>>> free to extend their functionality >>>> > >>>> > Do we really mean that all executors are public? Or do we just mean >>>> that the executor interface is public, i.e. BaseExecutor? >>>> > >>>> > It's of course better for the healthy development of our built-in >>>> executors if we have no backcompat guarantees and can change them as >>>> needed. But yes this means that anyone else "building on" our executors >>>> would be wise to "vendor in" our executor code instead of just subclassing. >>>> Maybe it's a reasonable assumption that any user with the sophistication >>>> and, perhaps, chutzpah to customize one of our executors, has also the >>>> sophistication to manage this. >>>> > >>>> > What do y'all think? >>>> > >>>> > >>>> > >>>> > >>>> > >>>> >>>