Yeah. when we get the final PR I will also want to to test more scenarios - with IDE/mypy integration switching branches, uv syncing etc. and will be happy to help and document the contributor's doc to explain what and how to work with it. This would be a super cool thing if we get it to work seamlessly for everyone :)
On Thu, Jul 10, 2025 at 7:44 AM Amogh Desai <amoghdesai....@gmail.com> wrote: > Agreed! > > Once the PR is up, we can have these implementation level discussions > over there. Good chat however! > > Thanks & Regards, > Amogh Desai > > > On Wed, Jul 9, 2025 at 3:56 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Yeah. I think extracting one-by-one, feature-by-feature that we want to >> share to a separate distribution is the best approach - it will actually >> also help with the "__init__.py" cleanup - because almost by definition - >> those distributions will not be able to "reach" outside - i.e. they only >> can be "used" not "use" something else. Which means that - for example as >> it is now - configuration using logging which is using configuration >> (leading to circular dependencies and partially initialized modules) will >> not happen and we will have to figure out rather how to inject >> configuration into logging (from task-sdk, airflow-ctl, airflow-core) and >> to get the right sequence of initialization - rather than have the >> inter-feature-dependencies. >> >> And that is precisely what low-cyclomatic complexity is about as well - >> it generally leads to easier-to-maintain software that has well defined >> functionality and does not have those weird circular dependencies we have >> now. That's a kind-of side-effect of such a "per feature" split, but a very >> desirable one. >> >> J. >> >> On Wed, Jul 9, 2025 at 10:17 AM Amogh Desai <amoghdesai....@gmail.com> >> wrote: >> >>> Probably, you make a valid point. >>> >>> Maybe this is an implementation detail, so we could figure it out as we >>> start on a POC and factor in these things >>> as we move along? >>> >>> But from an initial guess, I would think that execution time related >>> items (if we manage to enumerate them) would be something >>> that would be better off in that "core_and_task_sdk" bundle. >>> >>> Thanks & Regards, >>> Amogh Desai >>> >>> >>> On Tue, Jul 8, 2025 at 3:46 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>> >>>> > Not that I am against your idea and we can surely expand as we need >>>> but we would not need to expand the >>>> "core_and_task_sdk" if we put only the relevant items into it. >>>> >>>> So if we move logging and config out, my question is what is really >>>> relevant to "stay" in "core_and_task_sdk" ? And what we know will not be >>>> needed by other distributions in the future ? >>>> >>>> What would be the content of "core_and_task_sdk" ? Can we enumerate >>>> what should go there now ? >>>> >>>> My bet is that if we enumerate them, it will turn out that they are >>>> good candidates to make separate "shared features" that can be logically >>>> named and modularised, and that there is no real need to have shared >>>> "core_and_task_sdk" which sounds like "bag of everything else". >>>> >>>> >>>> >>>> On Tue, Jul 8, 2025 at 11:48 AM Amogh Desai <amoghdesai....@gmail.com> >>>> wrote: >>>> >>>>> Yeah, I think what you are showcasing here is a step ahead of the >>>>> initial proposal from Ash. >>>>> >>>>> From the original proposal, the `core_and_task_sdk` *can* have the >>>>> things relevant to just those two >>>>> distros. Logging, Config are modules that might be needed by >>>>> airflow-ctl for example, so ideally, those >>>>> would not be good candidates to be put in there, ideally speaking. >>>>> >>>>> The example of Kubernetes Utils sounds to be a good example, it will >>>>> be used by KubeExecutor (lets say this is a >>>>> module called "executors") and by KPO (providers), and the >>>>> "shared/kubernetes" would probably be a good >>>>> candidate for that. >>>>> >>>>> Not that I am against your idea and we can surely expand as we need >>>>> but we would not need to expand the >>>>> "core_and_task_sdk" if we put only the relevant items into it. >>>>> >>>>> Thanks & Regards, >>>>> Amogh Desai >>>>> >>>>> >>>>> On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>>> >>>>>> > @Jarek Potiuk <ja...@potiuk.com> a little confused on what you >>>>>> mean there, I am understanding the direction >>>>>> but could you elaborate a bit more please? >>>>>> >>>>>> Let me elaborate: >>>>>> >>>>>> As I understand (maybe I am wrong?), the proposal is that we have a >>>>>> "core-and-task-sdk" folder which is a shared distribution that is >>>>>> vendored-in into both "airflow-core" and "airflow-task-sdk". This >>>>>> contains some shared code that we want to include in both distributions >>>>>> (note that we never ever release "core-and-task-sdk" >>>>>> distribution because it only contains code that is shared between the two >>>>>> distributions we release. >>>>>> >>>>>> That's fine and cool. >>>>>> >>>>>> Imagine that this distribution contains "logging" (shared code >>>>>> for logging) and "config" (shared code for configuration). - both needed >>>>>> in >>>>>> "airflow-core" and "airflow-task-sdk". So far so good. But what happen if >>>>>> we want to use logging in the same fashion in say "airflow-ctl" (that >>>>>> also >>>>>> applies for other distributions that we might come up with) ? Are we >>>>>> going >>>>>> to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" >>>>>> ? >>>>>> It would be far better if we just vendor in "logging" and do not >>>>>> vendor-in >>>>>> "config". >>>>>> >>>>>> And if we are going to have a mechanism to vendor-in "a distribution" >>>>>> - there is nothing wrong with having the same way to vendor-in multiple >>>>>> distributions - so we can easily do it this way (i added "orm" , >>>>>> "serialization". and "fast_api" as an example thing that we might want to >>>>>> share - not sure if that is really something we want to do but it will >>>>>> allow to illustrate my idea better) >>>>>> >>>>>> / >>>>>> airflow-ctl/ >>>>>> task-sdk/... >>>>>> airflow-core/... >>>>>> .... >>>>>> shared/ >>>>>> kubernetes/ >>>>>> pyproject.toml >>>>>> src/ >>>>>> airflow_shared_kubernetes/__init__.py >>>>>> logging/ >>>>>> pyproject.toml >>>>>> src/ >>>>>> airflow_shared_logging/__init__.py >>>>>> config/ >>>>>> pyproject.toml >>>>>> src/ >>>>>> airflow_shared_config/__init__.py >>>>>> orm/ >>>>>> pyproject.toml >>>>>> src/ >>>>>> airflow_shared_orm/__init__.py >>>>>> serialization/ >>>>>> pyproject.toml >>>>>> src/ >>>>>> airflow_shared_serialization/__init__.py >>>>>> fast_api/ >>>>>> pyproject.toml >>>>>> src/ >>>>>> airflow_shared_fast_api/__init__.py >>>>>> ... >>>>>> >>>>>> This has multiple benefits (and I see no real drawbacks): >>>>>> >>>>>> * the code can be really well modularised. Those "things" we share >>>>>> (and this also connects to the __init__.py discussion) - can be >>>>>> independent >>>>>> - and (it follow Jens comment) it allows to keep low cyclomatic >>>>>> complexity >>>>>> https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way >>>>>> easier to implement logging in the way that it does not import or use >>>>>> config. This means for example that configuration for logging will need >>>>>> to >>>>>> be injected when logging is initialized - and that's exactly what we >>>>>> want, >>>>>> we do not want logging code to use configuration code directly - they >>>>>> should be independent from each other and you should be free to vendor-in >>>>>> either logging or config independently if you also vendored-in the other. >>>>>> >>>>>> * it's much more logical. We split based on functionality we want to >>>>>> share - not about the "distributions" we want to produce. That allows us >>>>>> - >>>>>> in the future - to make different decisions on how we split our >>>>>> distributions. For example (i do not tell we have to do it, or that we >>>>>> will >>>>>> do it but we will have such a possibility) - we can add more shared >>>>>> utilities we find useful in the same way and decide that "scheduler", >>>>>> "api_server" or "scheduler" or "triggerer" or "dag processor" are split >>>>>> to >>>>>> separate distributions - because for example we want to keep a number of >>>>>> dependencies down. And for example "api_server" might use "fast_api", >>>>>> "config", "logging", "orm" and "fast_api" and "serialization" , where the >>>>>> scheduler should not need "fast_api". The "dag_processor" eventually >>>>>> might >>>>>> not need "orm" nor "fast_api" and only use the other >>>>>> >>>>>> This seems like a natural approach. If we have a mechanism to "share" >>>>>> the code, it does not add complexity, but allows us to isolate >>>>>> independent >>>>>> functionality into "isolated" boxes and use them >>>>>> >>>>>> Also for cyclomatic complexity that is a complex word (badly chosen >>>>>> as it scares people away) and has some math behind, but it really boils >>>>>> down to very simple "rules of thumb" (and yes I am a big proponent of >>>>>> having low cyclomatic complexity). >>>>>> >>>>>> a) when you are building the "final" product (i.e. distribution you >>>>>> want to release) - make sure that you only "use" things - that nothing >>>>>> else >>>>>> is "using" you as a library. >>>>>> b) when you are building a "shared" thing (a library) - make sure >>>>>> that library is only "used" by others but it does not really "use" >>>>>> anything >>>>>> else. For example in the case I explained above - we can achieve >>>>>> low-cyclomatic complexity when: >>>>>> >>>>>> * airflow-core uses: logging, config, orm, serialization, fast_api >>>>>> * none of the "logging, config, orm, serialization, fast_api" use >>>>>> each other - they are at the bottom of the "user -> used" tree >>>>>> >>>>>> J. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I like the folder structure proposed by Ash and have no objections >>>>>>> with it. >>>>>>> >>>>>>> "core_and_task_sdk" sounds good to me and justifies what it should >>>>>>> do pretty well. >>>>>>> >>>>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean >>>>>>> there, I am understanding the direction >>>>>>> but could you elaborate a bit more please? >>>>>>> >>>>>>> Naming is REALLY hard! >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Amogh Desai >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> >>>>>>> wrote: >>>>>>> >>>>>>>> How about splitting it even more and having each shared "thing" >>>>>>>> named? >>>>>>>> "logging", "config" and sharing them explicitly and separately with >>>>>>>> the >>>>>>>> right "user" ? >>>>>>>> That sounds way more modular and we will be able to choose which >>>>>>>> of the >>>>>>>> shared "utils" we use where. >>>>>>>> >>>>>>>> J. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler >>>>>>>> <j_scheff...@gmx.de.invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>> > I like "core_and_task_sdk the same like core-and-task-sdk - I >>>>>>>> have no >>>>>>>> > problem and it is a path only. >>>>>>>> > >>>>>>>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which >>>>>>>> is a >>>>>>>> > bit bulky we then should name it "all-not-api-server" :-D >>>>>>>> > >>>>>>>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote: >>>>>>>> > > In case I did a bad job explaining it, the “core and task sdk” >>>>>>>> is not in >>>>>>>> > the module name/import name, just in the file path. >>>>>>>> > > >>>>>>>> > > Anyone have other ideas? >>>>>>>> > > >>>>>>>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> >>>>>>>> wrote: >>>>>>>> > >> >>>>>>>> > >> Thanks Ash! Looks cool! I like the structure. This will >>>>>>>> enable all the >>>>>>>> > >> combinations and structure looks easy to grasp. No strong >>>>>>>> stance on the >>>>>>>> > >> naming other than maybe it is a bit long with `and`, >>>>>>>> `core_ctl` could be >>>>>>>> > >> shorter, since no import path is defined like that, we can >>>>>>>> give any name >>>>>>>> > >> for sure. >>>>>>>> > >> >>>>>>>> > >> Best regards, >>>>>>>> > >> >>>>>>>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> >>>>>>>> wrote: >>>>>>>> > >>> >>>>>>>> > >>> Looks good but I think we should find some better logical >>>>>>>> name for >>>>>>>> > >>> core_and_sdk :) >>>>>>>> > >>> >>>>>>>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler >>>>>>>> > >>> <j_scheff...@gmx.de.invalid> napisał: >>>>>>>> > >>> >>>>>>>> > >>>> Cool! Especially the "shared" folder with the ability to have >>>>>>>> > >>>> N-combinations w/o exploding project repo root! >>>>>>>> > >>>> >>>>>>>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: >>>>>>>> > >>>>> Oh, and all of this will be explain in shared/README.md >>>>>>>> > >>>>> >>>>>>>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> >>>>>>>> wrote: >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Okay, so it seems we have agreement on the approach here, >>>>>>>> so I’ll >>>>>>>> > >>>> continue with this, and on the dev call it was mentioned that >>>>>>>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal >>>>>>>> for the >>>>>>>> > file >>>>>>>> > >>>> structure; >>>>>>>> > >>>>>> ``` >>>>>>>> > >>>>>> / >>>>>>>> > >>>>>> task-sdk/... >>>>>>>> > >>>>>> airflow-core/... >>>>>>>> > >>>>>> shared/ >>>>>>>> > >>>>>> kuberenetes/ >>>>>>>> > >>>>>> pyproject.toml >>>>>>>> > >>>>>> src/ >>>>>>>> > >>>>>> airflow_kube/__init__.py >>>>>>>> > >>>>>> core-and-tasksdk/ >>>>>>>> > >>>>>> pyproject.toml >>>>>>>> > >>>>>> src/ >>>>>>>> > >>>>>> airflow_shared/__init__.py >>>>>>>> > >>>>>> ``` >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Things to note here: the “shared” folder has (the >>>>>>>> possibility) of >>>>>>>> > >>>> having multiple different shared “libraries” in it, in this >>>>>>>> example I >>>>>>>> > am >>>>>>>> > >>>> supposing a hypothetical shared kuberenetes folder a world >>>>>>>> in which we >>>>>>>> > >>>> split the KubePodOperator and the KubeExecutor in to two >>>>>>>> separate >>>>>>>> > >>>> distributions (example only, not proposing we do that right >>>>>>>> now, and >>>>>>>> > that >>>>>>>> > >>>> will be a separate discussion) >>>>>>>> > >>>>>> The other things to note here: >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> - the folder name in shared aims to be “self-documenting”, >>>>>>>> hence the >>>>>>>> > >>>> verbose “core-and-tasksdk” to say where the shared library is >>>>>>>> > intended to >>>>>>>> > >>>> be used. >>>>>>>> > >>>>>> - the python module itself should almost always have an >>>>>>>> `airflow_` >>>>>>>> > (or >>>>>>>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with >>>>>>>> anything >>>>>>>> > >>> else >>>>>>>> > >>>> we might use. It won’t matter “in production” as those will >>>>>>>> be >>>>>>>> > vendored >>>>>>>> > >>> in >>>>>>>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but >>>>>>>> avoiding >>>>>>>> > >>>> conflicts at dev time with the Finder approach is a good >>>>>>>> safety >>>>>>>> > measure. >>>>>>>> > >>>>>> I will start making a real PR for this proposal now, but >>>>>>>> I’m open to >>>>>>>> > >>>> feedback (either here, or in the PR when I open it) >>>>>>>> > >>>>>> -ash >>>>>>>> > >>>>>> >>>>>>>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> >>>>>>>> wrote: >>>>>>>> > >>>>>>> >>>>>>>> > >>>>>>> Yeah we have to try it and test - also building packages >>>>>>>> happens >>>>>>>> > semi >>>>>>>> > >>>>>>> frequently when you run `uv sync` (they use some kind of >>>>>>>> heuristics >>>>>>>> > >>> to >>>>>>>> > >>>>>>> decide when) and you can force it with `--reinstall` or >>>>>>>> > `--refresh`. >>>>>>>> > >>>>>>> Package build also happens every time when you run >>>>>>>> "ci-image build` >>>>>>>> > >>>> now in >>>>>>>> > >>>>>>> breeze so it seems like it will nicely integrate in our >>>>>>>> workflows. >>>>>>>> > >>>>>>> >>>>>>>> > >>>>>>> Looks really cool Ash. >>>>>>>> > >>>>>>> >>>>>>>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor < >>>>>>>> a...@apache.org> >>>>>>>> > >>>> wrote: >>>>>>>> > >>>>>>>> It’s not just release time, but any time we build a >>>>>>>> package which >>>>>>>> > >>>> happens >>>>>>>> > >>>>>>>> on “every” CI run. The normal unit tests will use code >>>>>>>> from >>>>>>>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which >>>>>>>> build an >>>>>>>> > >>>> image will >>>>>>>> > >>>>>>>> build the dists and vendor in the code from that commit. >>>>>>>> > >>>>>>>> >>>>>>>> > >>>>>>>> There is only a single copy of the shared code committed >>>>>>>> to the >>>>>>>> > >>> repo, >>>>>>>> > >>>> so >>>>>>>> > >>>>>>>> there is never anything to synchronise. >>>>>>>> > >>>>>>>> >>>>>>>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai < >>>>>>>> amoghdesai....@gmail.com> >>>>>>>> > >>>> wrote: >>>>>>>> > >>>>>>>>> Thanks Ash. >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> This is really cool and helpful that you were able to >>>>>>>> test both >>>>>>>> > >>>> scenarios >>>>>>>> > >>>>>>>>> -- repo checkout >>>>>>>> > >>>>>>>>> and also installing from the vendored package and the >>>>>>>> resolution >>>>>>>> > >>>> worked >>>>>>>> > >>>>>>>>> fine too. >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> I like this idea compared the to relative import one >>>>>>>> for few >>>>>>>> > >>> reasons: >>>>>>>> > >>>>>>>>> - It feels like it will take some time to adjust to the >>>>>>>> new >>>>>>>> > coding >>>>>>>> > >>>>>>>> standard >>>>>>>> > >>>>>>>>> that we will lay >>>>>>>> > >>>>>>>>> if we impose relative imports in the shared dist >>>>>>>> > >>>>>>>>> - We can continue using repo wise absolute import >>>>>>>> standards, it >>>>>>>> > is >>>>>>>> > >>>> also >>>>>>>> > >>>>>>>>> much easier for situations >>>>>>>> > >>>>>>>>> when we do global search in IDE to find + replace, this >>>>>>>> could >>>>>>>> > mean >>>>>>>> > >>> a >>>>>>>> > >>>>>>>> change >>>>>>>> > >>>>>>>>> there >>>>>>>> > >>>>>>>>> - The vendoring work is a proven and established >>>>>>>> paradigm across >>>>>>>> > >>>> projects >>>>>>>> > >>>>>>>>> and would >>>>>>>> > >>>>>>>>> out of box give us the build tooling we need also >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> Nothing too against the relative import but with the >>>>>>>> evidence >>>>>>>> > >>>> provided >>>>>>>> > >>>>>>>>> above, vendored approach >>>>>>>> > >>>>>>>>> seems to only do us good. >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> Regarding synchronizing it, release time should be fine >>>>>>>> as long >>>>>>>> > as >>>>>>>> > >>> we >>>>>>>> > >>>>>>>> have >>>>>>>> > >>>>>>>>> a good CI workflow to probably >>>>>>>> > >>>>>>>>> catch such issues per PR if changes are made in shared >>>>>>>> dist? >>>>>>>> > >>>> (precommit >>>>>>>> > >>>>>>>>> would make it really slow i guess) >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> If we can run our tests with vendored code we should be >>>>>>>> mostly >>>>>>>> > >>>> covered. >>>>>>>> > >>>>>>>>> Good effort all! >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> Thanks & Regards, >>>>>>>> > >>>>>>>>> Amogh Desai >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>> >>>>>>>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor < >>>>>>>> > a...@apache.org> >>>>>>>> > >>>>>>>> wrote: >>>>>>>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m >>>>>>>> happy with. >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core >>>>>>>> > >>>>>>>>>> This produces the following from `uv build task-sdk` >>>>>>>> > >>>>>>>>>> - >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>>> > >>>>>>>>>> - >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip >>>>>>>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will >>>>>>>> .zip) >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> ❯ unzip -l >>>>>>>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | >>>>>>>> > >>>>>>>> grep >>>>>>>> > >>>>>>>>>> _vendor >>>>>>>> > >>>>>>>>>> 50 02-02-2020 00:00 >>>>>>>> airflow/sdk/_vendor/.gitignore >>>>>>>> > >>>>>>>>>> 2082 02-02-2020 00:00 >>>>>>>> airflow/sdk/_vendor/__init__.py >>>>>>>> > >>>>>>>>>> 28 02-02-2020 00:00 >>>>>>>> > airflow/sdk/_vendor/airflow_common.pyi >>>>>>>> > >>>>>>>>>> 18 02-02-2020 00:00 >>>>>>>> airflow/sdk/_vendor/vendor.txt >>>>>>>> > >>>>>>>>>> 785 02-02-2020 00:00 >>>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>>> > >>>>>>>>>> 10628 02-02-2020 00:00 >>>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is >>>>>>>> complete too: >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>>> |grep >>>>>>>> > _vendor >>>>>>>> > >>>>>>>>>> >>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore >>>>>>>> > >>>>>>>>>> >>>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>> >>>>>>>> > >>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi >>>>>>>> > >>>>>>>>>> >>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>>> > >>> >>>>>>>> > >>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> The plugin works at build time by including/copying >>>>>>>> the libs >>>>>>>> > >>>> specified >>>>>>>> > >>>>>>>> in >>>>>>>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care >>>>>>>> of import >>>>>>>> > >>>>>>>> rewrites.) >>>>>>>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from >>>>>>>> a repo >>>>>>>> > >>>> checkout, >>>>>>>> > >>>>>>>> I >>>>>>>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and >>>>>>>> since it’s at >>>>>>>> > >>> the >>>>>>>> > >>>>>>>> end of >>>>>>>> > >>>>>>>>>> the list it will only be used if the normal import >>>>>>>> can’t find >>>>>>>> > >>>> things. >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py >>>>>>>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect >>>>>>>> “import >>>>>>>> > >>>> rewriting” >>>>>>>> > >>>>>>>>>> affect, as in this approach `airflow_common` is >>>>>>>> directly loaded >>>>>>>> > >>>> (i.e. >>>>>>>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common >>>>>>>> exist in >>>>>>>> > >>>>>>>>>> sys.modules), but it does work for everything that I >>>>>>>> was able to >>>>>>>> > >>>> test.. >>>>>>>> > >>>>>>>>>> I tested it with the diff at the end of this message. >>>>>>>> My test >>>>>>>> > >>>> ipython >>>>>>>> > >>>>>>>>>> shell: >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> In [1]: from >>>>>>>> airflow.sdk._vendor.airflow_common.timezone import >>>>>>>> > >>> foo >>>>>>>> > >>>>>>>>>> In [2]: foo >>>>>>>> > >>>>>>>>>> Out[2]: 1 >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> In [4]: import >>>>>>>> airflow.sdk._vendor.airflow_common.timezone >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ >>>>>>>> > >>>>>>>>>> Out[5]: >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' >>>>>>>> > >>>>>>>>>> In [6]: >>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>>> > >>>>>>>>>> Out[6]: >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> And in an standalone environment with the SDK dist I >>>>>>>> built (it >>>>>>>> > >>>> needed >>>>>>>> > >>>>>>>> the >>>>>>>> > >>>>>>>>>> matching airflow-core right now, but that is nothing >>>>>>>> to do with >>>>>>>> > >>> this >>>>>>>> > >>>>>>>>>> discussion): >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with >>>>>>>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with >>>>>>>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl >>>>>>>> ipython >>>>>>>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang >>>>>>>> 18.1.8 ] >>>>>>>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more >>>>>>>> information >>>>>>>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type >>>>>>>> '?' for >>>>>>>> > >>> help. >>>>>>>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the >>>>>>>> options with >>>>>>>> > >>>>>>>> `%history?` >>>>>>>> > >>>>>>>>>> In [1]: import >>>>>>>> airflow.sdk._vendor.airflow_common.timezone >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> In [2]: >>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>>> > >>>>>>>>>> Out[2]: >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ >>>>>>>> > >>>>>>>>>> `` >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> ```diff >>>>>>>> > >>>>>>>>>> diff --git >>>>>>>> a/airflow-common/src/airflow_common/__init__.py >>>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py >>>>>>>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644 >>>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py >>>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py >>>>>>>> > >>>>>>>>>> @@ -14,3 +14,5 @@ >>>>>>>> > >>>>>>>>>> # KIND, either express or implied. See the License >>>>>>>> for the >>>>>>>> > >>>>>>>>>> # specific language governing permissions and >>>>>>>> limitations >>>>>>>> > >>>>>>>>>> # under the License. >>>>>>>> > >>>>>>>>>> + >>>>>>>> > >>>>>>>>>> +foo = 1 >>>>>>>> > >>>>>>>>>> diff --git >>>>>>>> a/airflow-common/src/airflow_common/timezone.py >>>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py >>>>>>>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644 >>>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py >>>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py >>>>>>>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = >>>>>>>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 >>>>>>>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 >>>>>>>> > >>>>>>>>>> utc = pendulum.UTC >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> + >>>>>>>> > >>>>>>>>>> +from airflow_common import foo >>>>>>>> > >>>>>>>>>> + >>>>>>>> > >>>>>>>>>> TIMEZONE: Timezone >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk < >>>>>>>> ja...@potiuk.com> >>>>>>>> > wrote: >>>>>>>> > >>>>>>>>>>> I think both approaches are doable: >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by >>>>>>>> pre-commit when >>>>>>>> > >>>>>>>> importing >>>>>>>> > >>>>>>>>>>> from different distributions and make sure we are >>>>>>>> only doing >>>>>>>> > >>>> relative >>>>>>>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of >>>>>>>> this >>>>>>>> > >>>> already. And >>>>>>>> > >>>>>>>>>> yes >>>>>>>> > >>>>>>>>>>> it would require relative links we currently do not >>>>>>>> allow. >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point >>>>>>>> in time >>>>>>>> > >>> will >>>>>>>> > >>>> have >>>>>>>> > >>>>>>>>>> to >>>>>>>> > >>>>>>>>>>> decide to synchronize this and if it happens just >>>>>>>> before >>>>>>>> > release >>>>>>>> > >>>> (I bet >>>>>>>> > >>>>>>>>>>> this is going to happen) this will lead to solving >>>>>>>> problems >>>>>>>> > that >>>>>>>> > >>>> would >>>>>>>> > >>>>>>>>>>> normally be solved during PR when you make a change >>>>>>>> (i.e. >>>>>>>> > >>> symbolic >>>>>>>> > >>>> link >>>>>>>> > >>>>>>>>>> has >>>>>>>> > >>>>>>>>>>> the advantage that whoever modifies shared code will >>>>>>>> be >>>>>>>> > >>> immediately >>>>>>>> > >>>>>>>>>>> notified in their PR - that they broke something >>>>>>>> because either >>>>>>>> > >>>> static >>>>>>>> > >>>>>>>>>>> checks or mypy or tests fail. >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when) >>>>>>>> does the >>>>>>>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could >>>>>>>> solve it >>>>>>>> > if >>>>>>>> > >>>> it is >>>>>>>> > >>>>>>>>>> done >>>>>>>> > >>>>>>>>>>> more frequently and with some regularity? We could >>>>>>>> potentially >>>>>>>> > >>>> force >>>>>>>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code >>>>>>>> changes >>>>>>>> > (and >>>>>>>> > >>>>>>>> prevent >>>>>>>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place >>>>>>>> (other than >>>>>>>> > >>>> releases) >>>>>>>> > >>>>>>>>>> in >>>>>>>> > >>>>>>>>>>> our development workflow and that seems to be a bit >>>>>>>> too late as >>>>>>>> > >>>> puts an >>>>>>>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities >>>>>>>> introduced >>>>>>>> > on >>>>>>>> > >>>>>>>> release >>>>>>>> > >>>>>>>>>>> manager and delays the release. WDYT? >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> Re: relative links. I think for a shared library we >>>>>>>> could >>>>>>>> > >>>> potentially >>>>>>>> > >>>>>>>>>> relax >>>>>>>> > >>>>>>>>>>> this and allow them (and actually disallow absolute >>>>>>>> links in >>>>>>>> > the >>>>>>>> > >>>> pieces >>>>>>>> > >>>>>>>>>> of >>>>>>>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I >>>>>>>> recall, the >>>>>>>> > >>> only >>>>>>>> > >>>>>>>>>> reason >>>>>>>> > >>>>>>>>>>> we forbade the relative links is because of how we >>>>>>>> are (or >>>>>>>> > maybe >>>>>>>> > >>>> were) >>>>>>>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So >>>>>>>> we decided >>>>>>>> > >>> to >>>>>>>> > >>>> just >>>>>>>> > >>>>>>>>>> not >>>>>>>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing >>>>>>>> works is >>>>>>>> > >>> that >>>>>>>> > >>>>>>>> when >>>>>>>> > >>>>>>>>>>> you are using importlib to read the Dag from a file, >>>>>>>> the >>>>>>>> > relative >>>>>>>> > >>>>>>>> imports >>>>>>>> > >>>>>>>>>>> do not work as it does not know what they should be >>>>>>>> relative >>>>>>>> > to. >>>>>>>> > >>>> But if >>>>>>>> > >>>>>>>>>>> relative import is done from an imported package, it >>>>>>>> should be >>>>>>>> > no >>>>>>>> > >>>>>>>>>> problem, >>>>>>>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to >>>>>>>> import any >>>>>>>> > >>>> library >>>>>>>> > >>>>>>>> that >>>>>>>> > >>>>>>>>>>> uses relative imports. >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> Of course consistency might be the reason why we do >>>>>>>> not want to >>>>>>>> > >>>>>>>> introduce >>>>>>>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is >>>>>>>> guarded >>>>>>>> > by >>>>>>>> > >>>>>>>>>> pre-commit >>>>>>>> > >>>>>>>>>>> though. >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> J. >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> J. >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < >>>>>>>> > >>>> a...@apache.org> >>>>>>>> > >>>>>>>>>>> napisał: >>>>>>>> > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It >>>>>>>> does mean >>>>>>>> > >>> you >>>>>>>> > >>>>>>>> can’t >>>>>>>> > >>>>>>>>>>>> use absolute imports in one common module to another. >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> For example >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 >>>>>>>> > >>>>>>>>>>>> where we have >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>>>> from airflow.utils.module_loading import >>>>>>>> import_string >>>>>>>> > >>>>>>>>>>>> ``` >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib >>>>>>>> that is >>>>>>>> > then >>>>>>>> > >>>>>>>>>> symlinked >>>>>>>> > >>>>>>>>>>>> then we wouldn’t be able to have `from >>>>>>>> > >>>> airflow_common.module_loading >>>>>>>> > >>>>>>>>>> import >>>>>>>> > >>>>>>>>>>>> import_string`. >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> I can think of two possible solutions here. >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this >>>>>>>> shared lib, so >>>>>>>> > >>>> `from >>>>>>>> > >>>>>>>>>>>> .module_loading import import_string` >>>>>>>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip >>>>>>>> maintainers) which >>>>>>>> > >>> will >>>>>>>> > >>>>>>>> handle >>>>>>>> > >>>>>>>>>>>> import-rewriting for us. >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a >>>>>>>> thing, so I >>>>>>>> > >>>> prepared >>>>>>>> > >>>>>>>> a >>>>>>>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could >>>>>>>> look like >>>>>>>> > here >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 >>>>>>>> > >>>>>>>>>>>> Now personally I am more than happy with relative >>>>>>>> imports, but >>>>>>>> > >>>>>>>> generally >>>>>>>> > >>>>>>>>>>>> as a project we have avoided them, so I think that >>>>>>>> limits what >>>>>>>> > >>> we >>>>>>>> > >>>>>>>> could >>>>>>>> > >>>>>>>>>> do >>>>>>>> > >>>>>>>>>>>> with a symlink based approach. >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> -ash >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring >>>>>>>> > >>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < >>>>>>>> > >>>>>>>> gopidesupa...@gmail.com> >>>>>>>> > >>>>>>>>>>>> wrote: >>>>>>>> > >>>>>>>>>>>>> Thanks Ash >>>>>>>> > >>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. >>>>>>>> Making sure we >>>>>>>> > >>>> have all >>>>>>>> > >>>>>>>>>> the >>>>>>>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in >>>>>>>> code sharing >>>>>>>> > >>> and >>>>>>>> > >>>>>>>>>>>> executing >>>>>>>> > >>>>>>>>>>>>> right of tests between the packages. >>>>>>>> > >>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>> Agree with others, option 2 would be >>>>>>>> > >>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < >>>>>>>> > >>>>>>>> amoghdesai....@gmail.com> >>>>>>>> > >>>>>>>>>>>>> wrote: >>>>>>>> > >>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash. >>>>>>>> > >>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling >>>>>>>> to handle >>>>>>>> > the >>>>>>>> > >>>> code >>>>>>>> > >>>>>>>>>>>>>> duplication at *release* time. >>>>>>>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in >>>>>>>> itself. >>>>>>>> > >>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if >>>>>>>> we decide >>>>>>>> > >>> to >>>>>>>> > >>>>>>>>>> separate >>>>>>>> > >>>>>>>>>>>>>> triggerer / >>>>>>>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate >>>>>>>> packages, back >>>>>>>> > >>>> compat is >>>>>>>> > >>>>>>>>>>>> going >>>>>>>> > >>>>>>>>>>>>>> to be a nightmare >>>>>>>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate. >>>>>>>> > >>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>> Thanks & Regards, >>>>>>>> > >>>>>>>>>>>>>> Amogh Desai >>>>>>>> > >>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < >>>>>>>> > >>> kaxiln...@gmail.com> >>>>>>>> > >>>>>>>>>> wrote: >>>>>>>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of >>>>>>>> dependencies >>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler >>>>>>>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid >>>>>>>> > >>>>>>>>>>>>>>> wrote: >>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is >>>>>>>> it is >>>>>>>> > >>> rather >>>>>>>> > >>>>>>>>>>>> pragmatic >>>>>>>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and >>>>>>>> have less >>>>>>>> > >>>> package >>>>>>>> > >>>>>>>>>> counts >>>>>>>> > >>>>>>>>>>>>>>>> and dependencies. >>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking >>>>>>>> (together with >>>>>>>> > >>> Jarek, >>>>>>>> > >>>> I am >>>>>>>> > >>>>>>>>>> not >>>>>>>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would >>>>>>>> be >>>>>>>> > possible. >>>>>>>> > >>>> To >>>>>>>> > >>>>>>>> keep >>>>>>>> > >>>>>>>>>>>>>> the >>>>>>>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into >>>>>>>> another. If >>>>>>>> > then >>>>>>>> > >>>> at >>>>>>>> > >>>>>>>>>> point >>>>>>>> > >>>>>>>>>>>>>> of >>>>>>>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we >>>>>>>> have 1 set >>>>>>>> > >>> of >>>>>>>> > >>>>>>>> code. >>>>>>>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy >>>>>>>> could be >>>>>>>> > >>>> solved by >>>>>>>> > >>>>>>>> a >>>>>>>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are >>>>>>>> de-duplicated >>>>>>>> > >>>> anyway >>>>>>>> > >>>>>>>>>> based >>>>>>>> > >>>>>>>>>>>>>> on >>>>>>>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt. >>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: >>>>>>>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI >>>>>>>> - the >>>>>>>> > >>>> reasonings >>>>>>>> > >>>>>>>>>>>>>> you've >>>>>>>> > >>>>>>>>>>>>>>>>> shown for that make sense to me. >>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>> Shahar >>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash >>>>>>>> Berlin-Taylor < >>>>>>>> > >>>> a...@apache.org >>>>>>>> > >>>>>>>>>>>>>>> wrote: >>>>>>>> > >>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level >>>>>>>> separation of >>>>>>>> > >>>> Task >>>>>>>> > >>>>>>>> SDK >>>>>>>> > >>>>>>>>>>>>>> and >>>>>>>> > >>>>>>>>>>>>>>>> Core >>>>>>>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some >>>>>>>> situations >>>>>>>> > where >>>>>>>> > >>> we >>>>>>>> > >>>>>>>> would >>>>>>>> > >>>>>>>>>>>>>>> like >>>>>>>> > >>>>>>>>>>>>>>>> to >>>>>>>> > >>>>>>>>>>>>>>>>>> share code between these. >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just >>>>>>>> put it in >>>>>>>> > a >>>>>>>> > >>>> common >>>>>>>> > >>>>>>>>>>>>>> dist >>>>>>>> > >>>>>>>>>>>>>>>> they >>>>>>>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of >>>>>>>> the Task >>>>>>>> > SDK >>>>>>>> > >>>>>>>>>>>>>> separation >>>>>>>> > >>>>>>>>>>>>>>>> was >>>>>>>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence >>>>>>>> between the >>>>>>>> > >>> two, >>>>>>>> > >>>>>>>>>> ideally >>>>>>>> > >>>>>>>>>>>>>>>> even if >>>>>>>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. >>>>>>>> Most of the >>>>>>>> > >>>> reason >>>>>>>> > >>>>>>>>>> why >>>>>>>> > >>>>>>>>>>>>>>> this >>>>>>>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards >>>>>>>> > >>>> compatibility - >>>>>>>> > >>>>>>>> if >>>>>>>> > >>>>>>>>>> we >>>>>>>> > >>>>>>>>>>>>>>>> make >>>>>>>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about >>>>>>>> in >>>>>>>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 >>>>>>>> (but >>>>>>>> > that >>>>>>>> > >>>> covers >>>>>>>> > >>>>>>>>>>>>>> some >>>>>>>> > >>>>>>>>>>>>>>>> more >>>>>>>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this >>>>>>>> discussion >>>>>>>> > >>>> such as >>>>>>>> > >>>>>>>>>>>>>>>> possibly >>>>>>>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a >>>>>>>> single >>>>>>>> > >>> provider >>>>>>>> > >>>>>>>> dist.) >>>>>>>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I >>>>>>>> would like to >>>>>>>> > >>>> share >>>>>>>> > >>> >>>>>>>> > >>>>>>>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>>>>>>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want >>>>>>>> to share >>>>>>>> > will >>>>>>>> > >>>> be >>>>>>>> > >>>>>>>> the >>>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from >>>>>>>> airflow.configuration >>>>>>>> > (but >>>>>>>> > >>>>>>>> notably: >>>>>>>> > >>>>>>>>>>>>>>> only >>>>>>>> > >>>>>>>>>>>>>>>> the >>>>>>>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, >>>>>>>> again, >>>>>>>> > lets >>>>>>>> > >>>> not >>>>>>>> > >>>>>>>>>> dwell >>>>>>>> > >>>>>>>>>>>>>>> on >>>>>>>> > >>>>>>>>>>>>>>>> the >>>>>>>> > >>>>>>>>>>>>>>>>>> specifics of that) >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue >>>>>>>> here for >>>>>>>> > >>>> discussion, >>>>>>>> > >>>>>>>>>>>>>>> broadly >>>>>>>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches: >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution >>>>>>>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only >>>>>>>> have the >>>>>>>> > code >>>>>>>> > >>>> in one >>>>>>>> > >>>>>>>>>>>>>>> place. >>>>>>>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case >>>>>>>> of >>>>>>>> > Logging >>>>>>>> > >>>> config >>>>>>>> > >>>>>>>>>> or >>>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards >>>>>>>> > compatibility >>>>>>>> > >>> is >>>>>>>> > >>>>>>>> much >>>>>>>> > >>>>>>>>>>>>>> much >>>>>>>> > >>>>>>>>>>>>>>>>>> harder. >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the >>>>>>>> code is >>>>>>>> > >>> released >>>>>>>> > >>>>>>>>>>>>>>>> with/embedded >>>>>>>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk >>>>>>>> would contain >>>>>>>> > >>> the >>>>>>>> > >>>>>>>> right >>>>>>>> > >>>>>>>>>>>>>>>> version >>>>>>>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). >>>>>>>> The >>>>>>>> > downside >>>>>>>> > >>> is >>>>>>>> > >>>>>>>> that >>>>>>>> > >>>>>>>>>>>>>>> either >>>>>>>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the >>>>>>>> repo, or >>>>>>>> > better >>>>>>>> > >>>> yet >>>>>>>> > >>>>>>>> it >>>>>>>> > >>>>>>>>>>>>>>> would >>>>>>>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some >>>>>>>> tooling >>>>>>>> > (TBD) >>>>>>>> > >>>> will >>>>>>>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either >>>>>>>> at commit >>>>>>>> > >>>> time, or >>>>>>>> > >>>>>>>> my >>>>>>>> > >>>>>>>>>>>>>>>>>> preference, at release time. >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am >>>>>>>> very >>>>>>>> > strongly >>>>>>>> > >>>>>>>> leaning >>>>>>>> > >>>>>>>>>>>>>>>> towards >>>>>>>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think >>>>>>>> the >>>>>>>> > >>> backwards >>>>>>>> > >>>>>>>>>>>>>>>> compatibility >>>>>>>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very >>>>>>>> quickly over >>>>>>>> > >>>> time >>>>>>>> > >>>>>>>> the >>>>>>>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just >>>>>>>> be >>>>>>>> > >>>> unreasonable) >>>>>>>> > >>>>>>>>>> and I >>>>>>>> > >>>>>>>>>>>>>>>> don’t >>>>>>>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as >>>>>>>> we need >>>>>>>> > to >>>>>>>> > >>>> really >>>>>>>> > >>>>>>>>>>>>>>> deliver >>>>>>>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to >>>>>>>> delivery >>>>>>>> > >>> with >>>>>>>> > >>>>>>>>>> AIP-72. >>>>>>>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about >>>>>>>> this, I will >>>>>>>> > >>>> come up >>>>>>>> > >>>>>>>>>>>>>> with >>>>>>>> > >>>>>>>>>>>>>>> a >>>>>>>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will >>>>>>>> implement code >>>>>>>> > >>>> sharing >>>>>>>> > >>>>>>>>>> via >>>>>>>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea >>>>>>>> of how I >>>>>>>> > can >>>>>>>> > >>>>>>>> achieve >>>>>>>> > >>>>>>>>>>>>>> this >>>>>>>> > >>>>>>>>>>>>>>>> so >>>>>>>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll >>>>>>>> work >>>>>>>> > there, >>>>>>>> > >>>> but >>>>>>>> > >>>>>>>> at >>>>>>>> > >>>>>>>>>>>>>>>> runtime >>>>>>>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it >>>>>>>> lives at >>>>>>>> > >>> something >>>>>>>> > >>>>>>>> like >>>>>>>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we >>>>>>>> would end >>>>>>>> > up >>>>>>>> > >>>> with: >>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml >>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/src/ >>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/ >>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml >>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/src/ >>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/ >>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/src >>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/ >>>>>>>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, >>>>>>>> as deps >>>>>>>> > would >>>>>>>> > >>>> be >>>>>>>> > >>>>>>>>>>>>>> included >>>>>>>> > >>>>>>>>>>>>>>>> in >>>>>>>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD. >>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. >>>>>>>> > >>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail: >>>>>>>> > dev-unsubscr...@airflow.apache.org >>>>>>>> > >>>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>>>> > >>> dev-h...@airflow.apache.org >>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> > >>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >>>>>>>> To unsubscribe, e-mail: >>>>>>>> dev-unsubscr...@airflow.apache.org >>>>>>>> > >>>>>>>> For additional commands, e-mail: >>>>>>>> dev-h...@airflow.apache.org >>>>>>>> > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> > >>>>>> >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>> > >>>>>> For additional commands, e-mail: >>>>>>>> dev-h...@airflow.apache.org >>>>>>>> > >>>>>> >>>>>>>> > >>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>> > >>>>> For additional commands, e-mail: >>>>>>>> dev-h...@airflow.apache.org >>>>>>>> > >>>>> >>>>>>>> > >>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>> > >>>> >>>>>>>> > >>>> >>>>>>>> > > >>>>>>>> > > >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>> > > For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>> > > >>>>>>>> > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>> > For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>>