Happy to see the change I made was used as an example here ;-) Honestly the discussion is a bit long and getting increasingly general, so a bit hard to follow for me. But if I understand the conversation & issue correctly, here are my thoughts:
- We may not want to make a strong statement like "*all executors are public*". It's just impossible IMHO. - Let's just mark a limited number of key methods/interfaces in each executor as public, and we ensure a strong backcompat for them. For any methods/interfaces not marked, no backcompat guaranteed even between minor versions. This should also be added into documentation, as well as highlighted in the executor docstrings. I believe I must have missed some important points in the conversation, or even misunderstood some points. But just summarizing what I have understood as well as what I would prefer. Regards, XD On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > I understand it's "convenient" if they want to extend k8s executor. But I > guess my thought is, I'm not sure that convenience outweighs the competing > good which is, freedom to develop k8s executor. I'm not saying don't > extend k8s executor -- please do, and contribute it back. But for > subclassing / extending, that feels more like "you're on your own", in the > sense of, "just vendor the executor". > > For example in a recent PR, XD added better support for running k8s > executor with multiple namespace. Now instead of a single `watcher` on k8s > executor we have a `watchers` dict > <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>. > I suppose it's "technically" breaking, in a subclass sense. I don't really > think that's the kind of backcompat we should worry about when making > changes to k8s executor. > > Certainly the user-facing executor *behavior *should follow backcompat > guarantees. For example we can't just change the name of `pod_override` > (in executor config) since that would break user workflows. Similar, with > a conf setting... renaming it without backcompat etc. Where the dag writer > "feels" the executor, it should respect semver, certainly. But I'm not > sure we need to guarantee backcompat of its internal methods. > > Another real example: Jed right now is trying to rename pod_id to > pod_name, a sensible cleanup. But k8s internals are "public", we can't > really do that. And we could imagine lots of grey areas. Thought > experiment: maybe we want to move some method call out of `sync` and run it > periodically in a thread. That would change what `sync` does and if a > subclass depends on that, well I guess we've broken that subclass. > > WDYT? > > > On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Yeah. The PR was mostly to bring things together- i.e to be able to >> say "this is really a public interface of ours". And by just doing it, >> I was actually hoping it will spark those kind of discussions where we >> will really detail on what it means for each of those to be public. >> >> Having it written down and part of our docs makes it easy to spark >> such discussion by pointing to it and asking "do we really mean that >> or that?" >> >> I think once we make a few passes and polish it even more and get to >> 2.6 release when this documentation part will be first time published, >> we **could** vote on it before (or maybe lazy-consent if we won't see >> many doubts raised). >> >> Whether it's an AIP - I don't think so, It's mostly about things that >> are there that our "intentions" are as public. It's not a legally >> binding document (I don't think anyone can make a legally binding >> backwards compatibility statement), it's merely a statement of intent >> of the community that our users can treat like something that they can >> build on when extending Airflow. And understand the rules and >> expectations when they look for details of each of the "components" we >> publish. >> >> Coming back to the original question: >> >> I personally don't think we should limit just Base Executor interface >> - because I can imagine there will be enough cases where extending >> existing Kubernetes Executor will be far easier than just using the >> interface and our users should understand some of the expectations >> they might have when doing so. >> >> One of the ways we could make our intentions clear there is to >> describe and follow the Python rules for what's internal and what's >> public. (under or even dunder in front of the class/method name). This >> is not "entirely" clear (protected vs private is somewhat blurry >> here). But maybe we could agree on some rules that we will apply in >> all such "public" interfaces of ours. We could do it with the executor >> classes and methods that we do not "intend" to be extended and rename >> them with the under or dunder in front) >> >> Also It can be implemented in one of two ways: >> >> 1) mark the methods/classes that we consider as "changeable" via >> adding dunder in front and describe it to the users so that they can >> understand what the rules are and make the right decisions >> 2) make it a bit more formal, automate and easier to use by the users >> in automated way - just follow what we have done with "common.sql" >> where we implemented a process to generate and control generated .pyi >> stubs: >> >> Pre-commit that takes care of it: >> >> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py >> Example generated stub file: >> >> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi >> >> Recently, we've gone through some first "teething" problems with the >> "stubgen" approach and while it is not perfect, we managed to bend it >> a bit to our expectations even if the original stubgen generator had >> some flaws. And I think we would be able to generalise the common.sql >> approach to all kinds of public interfaces we have. >> >> One problem with the stub approach is that it's hard to distinguish >> "internal" vs. "external" use of some of the interfaces, However in >> our case, rather than generate the .pyi files and storing it in the >> code of airflow, we could simply generate and publish the "typeshed" >> interface for Apache Airflow which will ONLY contain the "externally >> public" interfaces (https://github.com/python/typeshed) . What >> typeshed is (from the typeshed github): >> >> "" >> Typeshed contains external type annotations for the Python standard >> library and Python builtins, as well as third party packages as >> contributed by people external to those projects. >> This data can e.g. be used for static analysis, type checking or type >> inference. >> """ >> >> This way a user who wishes to extend airflow could simply install the >> `types-apache-airflow==2.6.0` and this will give a very strong base >> on understanding what is / is not public - usable in automated way >> >> I think it would not be complex to generate such a typeshed package if >> we go that direction and it would help us also to protect from >> accidental changes (same as we do in common.sql - by automating the >> part where we check if the generated interface has not changed in >> backwards-incompatible way). >> >> J. >> >> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish >> <daniel.stand...@astronomer.io.invalid> wrote: >> > >> > Following up here... that PR has been merged.... At some point we >> should probably have a vote on that, if it's meant to be actual binding >> policy. Maybe we're still feeling it out? But "what is public" is a >> pretty fundamental concern to the project. Maybe such a policy itself >> should be an AIP? >> > >> > Meanwhile, going down into the weeds a bit, there's one aspect of the >> doc which came up here: >> https://github.com/apache/airflow/pull/29147#discussion_r1086214750 >> > >> > It quotes the "policy": >> > >> > > Airflow has a set of Executors that are considered public. You are >> free to extend their functionality >> > >> > Do we really mean that all executors are public? Or do we just mean >> that the executor interface is public, i.e. BaseExecutor? >> > >> > It's of course better for the healthy development of our built-in >> executors if we have no backcompat guarantees and can change them as >> needed. But yes this means that anyone else "building on" our executors >> would be wise to "vendor in" our executor code instead of just subclassing. >> Maybe it's a reasonable assumption that any user with the sophistication >> and, perhaps, chutzpah to customize one of our executors, has also the >> sophistication to manage this. >> > >> > What do y'all think? >> > >> > >> > >> > >> > >> >