Yeah, I think what you are showcasing here is a step ahead of the initial proposal from Ash.
>From the original proposal, the `core_and_task_sdk` *can* have the things relevant to just those two distros. Logging, Config are modules that might be needed by airflow-ctl for example, so ideally, those would not be good candidates to be put in there, ideally speaking. The example of Kubernetes Utils sounds to be a good example, it will be used by KubeExecutor (lets say this is a module called "executors") and by KPO (providers), and the "shared/kubernetes" would probably be a good candidate for that. Not that I am against your idea and we can surely expand as we need but we would not need to expand the "core_and_task_sdk" if we put only the relevant items into it. Thanks & Regards, Amogh Desai On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean > there, I am understanding the direction > but could you elaborate a bit more please? > > Let me elaborate: > > As I understand (maybe I am wrong?), the proposal is that we have a > "core-and-task-sdk" folder which is a shared distribution that is > vendored-in into both "airflow-core" and "airflow-task-sdk". This > contains some shared code that we want to include in both distributions > (note that we never ever release "core-and-task-sdk" > distribution because it only contains code that is shared between the two > distributions we release. > > That's fine and cool. > > Imagine that this distribution contains "logging" (shared code > for logging) and "config" (shared code for configuration). - both needed in > "airflow-core" and "airflow-task-sdk". So far so good. But what happen if > we want to use logging in the same fashion in say "airflow-ctl" (that also > applies for other distributions that we might come up with) ? Are we going > to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" ? > It would be far better if we just vendor in "logging" and do not vendor-in > "config". > > And if we are going to have a mechanism to vendor-in "a distribution" - > there is nothing wrong with having the same way to vendor-in multiple > distributions - so we can easily do it this way (i added "orm" , > "serialization". and "fast_api" as an example thing that we might want to > share - not sure if that is really something we want to do but it will > allow to illustrate my idea better) > > / > airflow-ctl/ > task-sdk/... > airflow-core/... > .... > shared/ > kubernetes/ > pyproject.toml > src/ > airflow_shared_kubernetes/__init__.py > logging/ > pyproject.toml > src/ > airflow_shared_logging/__init__.py > config/ > pyproject.toml > src/ > airflow_shared_config/__init__.py > orm/ > pyproject.toml > src/ > airflow_shared_orm/__init__.py > serialization/ > pyproject.toml > src/ > airflow_shared_serialization/__init__.py > fast_api/ > pyproject.toml > src/ > airflow_shared_fast_api/__init__.py > ... > > This has multiple benefits (and I see no real drawbacks): > > * the code can be really well modularised. Those "things" we share (and > this also connects to the __init__.py discussion) - can be independent - > and (it follow Jens comment) it allows to keep low cyclomatic complexity > https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way > easier to implement logging in the way that it does not import or use > config. This means for example that configuration for logging will need to > be injected when logging is initialized - and that's exactly what we want, > we do not want logging code to use configuration code directly - they > should be independent from each other and you should be free to vendor-in > either logging or config independently if you also vendored-in the other. > > * it's much more logical. We split based on functionality we want to share > - not about the "distributions" we want to produce. That allows us - in the > future - to make different decisions on how we split our distributions. For > example (i do not tell we have to do it, or that we will do it but we will > have such a possibility) - we can add more shared utilities we find useful > in the same way and decide that "scheduler", "api_server" or "scheduler" or > "triggerer" or "dag processor" are split to separate distributions - > because for example we want to keep a number of dependencies down. And for > example "api_server" might use "fast_api", "config", "logging", "orm" and > "fast_api" and "serialization" , where the scheduler should not need > "fast_api". The "dag_processor" eventually might not need "orm" nor > "fast_api" and only use the other > > This seems like a natural approach. If we have a mechanism to "share" the > code, it does not add complexity, but allows us to isolate independent > functionality into "isolated" boxes and use them > > Also for cyclomatic complexity that is a complex word (badly chosen as it > scares people away) and has some math behind, but it really boils down to > very simple "rules of thumb" (and yes I am a big proponent of having low > cyclomatic complexity). > > a) when you are building the "final" product (i.e. distribution you want > to release) - make sure that you only "use" things - that nothing else is > "using" you as a library. > b) when you are building a "shared" thing (a library) - make sure that > library is only "used" by others but it does not really "use" anything > else. For example in the case I explained above - we can achieve > low-cyclomatic complexity when: > > * airflow-core uses: logging, config, orm, serialization, fast_api > * none of the "logging, config, orm, serialization, fast_api" use each > other - they are at the bottom of the "user -> used" tree > > J. > > > > On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com> > wrote: > >> I like the folder structure proposed by Ash and have no objections with >> it. >> >> "core_and_task_sdk" sounds good to me and justifies what it should do >> pretty well. >> >> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean >> there, I am understanding the direction >> but could you elaborate a bit more please? >> >> Naming is REALLY hard! >> >> Thanks & Regards, >> Amogh Desai >> >> >> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> How about splitting it even more and having each shared "thing" named? >>> "logging", "config" and sharing them explicitly and separately with the >>> right "user" ? >>> That sounds way more modular and we will be able to choose which of the >>> shared "utils" we use where. >>> >>> J. >>> >>> >>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler >>> <j_scheff...@gmx.de.invalid> >>> wrote: >>> >>> > I like "core_and_task_sdk the same like core-and-task-sdk - I have no >>> > problem and it is a path only. >>> > >>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which is a >>> > bit bulky we then should name it "all-not-api-server" :-D >>> > >>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote: >>> > > In case I did a bad job explaining it, the “core and task sdk” is >>> not in >>> > the module name/import name, just in the file path. >>> > > >>> > > Anyone have other ideas? >>> > > >>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> >>> wrote: >>> > >> >>> > >> Thanks Ash! Looks cool! I like the structure. This will enable all >>> the >>> > >> combinations and structure looks easy to grasp. No strong stance on >>> the >>> > >> naming other than maybe it is a bit long with `and`, `core_ctl` >>> could be >>> > >> shorter, since no import path is defined like that, we can give any >>> name >>> > >> for sure. >>> > >> >>> > >> Best regards, >>> > >> >>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote: >>> > >>> >>> > >>> Looks good but I think we should find some better logical name for >>> > >>> core_and_sdk :) >>> > >>> >>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler >>> > >>> <j_scheff...@gmx.de.invalid> napisał: >>> > >>> >>> > >>>> Cool! Especially the "shared" folder with the ability to have >>> > >>>> N-combinations w/o exploding project repo root! >>> > >>>> >>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: >>> > >>>>> Oh, and all of this will be explain in shared/README.md >>> > >>>>> >>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> >>> wrote: >>> > >>>>>> >>> > >>>>>> Okay, so it seems we have agreement on the approach here, so >>> I’ll >>> > >>>> continue with this, and on the dev call it was mentioned that >>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal for >>> the >>> > file >>> > >>>> structure; >>> > >>>>>> ``` >>> > >>>>>> / >>> > >>>>>> task-sdk/... >>> > >>>>>> airflow-core/... >>> > >>>>>> shared/ >>> > >>>>>> kuberenetes/ >>> > >>>>>> pyproject.toml >>> > >>>>>> src/ >>> > >>>>>> airflow_kube/__init__.py >>> > >>>>>> core-and-tasksdk/ >>> > >>>>>> pyproject.toml >>> > >>>>>> src/ >>> > >>>>>> airflow_shared/__init__.py >>> > >>>>>> ``` >>> > >>>>>> >>> > >>>>>> Things to note here: the “shared” folder has (the possibility) >>> of >>> > >>>> having multiple different shared “libraries” in it, in this >>> example I >>> > am >>> > >>>> supposing a hypothetical shared kuberenetes folder a world in >>> which we >>> > >>>> split the KubePodOperator and the KubeExecutor in to two separate >>> > >>>> distributions (example only, not proposing we do that right now, >>> and >>> > that >>> > >>>> will be a separate discussion) >>> > >>>>>> The other things to note here: >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> - the folder name in shared aims to be “self-documenting”, >>> hence the >>> > >>>> verbose “core-and-tasksdk” to say where the shared library is >>> > intended to >>> > >>>> be used. >>> > >>>>>> - the python module itself should almost always have an >>> `airflow_` >>> > (or >>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with >>> anything >>> > >>> else >>> > >>>> we might use. It won’t matter “in production” as those will be >>> > vendored >>> > >>> in >>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but >>> avoiding >>> > >>>> conflicts at dev time with the Finder approach is a good safety >>> > measure. >>> > >>>>>> I will start making a real PR for this proposal now, but I’m >>> open to >>> > >>>> feedback (either here, or in the PR when I open it) >>> > >>>>>> -ash >>> > >>>>>> >>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> >>> wrote: >>> > >>>>>>> >>> > >>>>>>> Yeah we have to try it and test - also building packages >>> happens >>> > semi >>> > >>>>>>> frequently when you run `uv sync` (they use some kind of >>> heuristics >>> > >>> to >>> > >>>>>>> decide when) and you can force it with `--reinstall` or >>> > `--refresh`. >>> > >>>>>>> Package build also happens every time when you run "ci-image >>> build` >>> > >>>> now in >>> > >>>>>>> breeze so it seems like it will nicely integrate in our >>> workflows. >>> > >>>>>>> >>> > >>>>>>> Looks really cool Ash. >>> > >>>>>>> >>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor < >>> a...@apache.org> >>> > >>>> wrote: >>> > >>>>>>>> It’s not just release time, but any time we build a package >>> which >>> > >>>> happens >>> > >>>>>>>> on “every” CI run. The normal unit tests will use code from >>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which build >>> an >>> > >>>> image will >>> > >>>>>>>> build the dists and vendor in the code from that commit. >>> > >>>>>>>> >>> > >>>>>>>> There is only a single copy of the shared code committed to >>> the >>> > >>> repo, >>> > >>>> so >>> > >>>>>>>> there is never anything to synchronise. >>> > >>>>>>>> >>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai < >>> amoghdesai....@gmail.com> >>> > >>>> wrote: >>> > >>>>>>>>> Thanks Ash. >>> > >>>>>>>>> >>> > >>>>>>>>> This is really cool and helpful that you were able to test >>> both >>> > >>>> scenarios >>> > >>>>>>>>> -- repo checkout >>> > >>>>>>>>> and also installing from the vendored package and the >>> resolution >>> > >>>> worked >>> > >>>>>>>>> fine too. >>> > >>>>>>>>> >>> > >>>>>>>>> I like this idea compared the to relative import one for few >>> > >>> reasons: >>> > >>>>>>>>> - It feels like it will take some time to adjust to the new >>> > coding >>> > >>>>>>>> standard >>> > >>>>>>>>> that we will lay >>> > >>>>>>>>> if we impose relative imports in the shared dist >>> > >>>>>>>>> - We can continue using repo wise absolute import standards, >>> it >>> > is >>> > >>>> also >>> > >>>>>>>>> much easier for situations >>> > >>>>>>>>> when we do global search in IDE to find + replace, this could >>> > mean >>> > >>> a >>> > >>>>>>>> change >>> > >>>>>>>>> there >>> > >>>>>>>>> - The vendoring work is a proven and established paradigm >>> across >>> > >>>> projects >>> > >>>>>>>>> and would >>> > >>>>>>>>> out of box give us the build tooling we need also >>> > >>>>>>>>> >>> > >>>>>>>>> Nothing too against the relative import but with the evidence >>> > >>>> provided >>> > >>>>>>>>> above, vendored approach >>> > >>>>>>>>> seems to only do us good. >>> > >>>>>>>>> >>> > >>>>>>>>> Regarding synchronizing it, release time should be fine as >>> long >>> > as >>> > >>> we >>> > >>>>>>>> have >>> > >>>>>>>>> a good CI workflow to probably >>> > >>>>>>>>> catch such issues per PR if changes are made in shared dist? >>> > >>>> (precommit >>> > >>>>>>>>> would make it really slow i guess) >>> > >>>>>>>>> >>> > >>>>>>>>> If we can run our tests with vendored code we should be >>> mostly >>> > >>>> covered. >>> > >>>>>>>>> Good effort all! >>> > >>>>>>>>> >>> > >>>>>>>>> Thanks & Regards, >>> > >>>>>>>>> Amogh Desai >>> > >>>>>>>>> >>> > >>>>>>>>> >>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor < >>> > a...@apache.org> >>> > >>>>>>>> wrote: >>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m happy >>> with. >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>> >>> > >>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core >>> > >>>>>>>>>> This produces the following from `uv build task-sdk` >>> > >>>>>>>>>> - >>> > >>>>>>>>>> >>> > >>> >>> > >>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz >>> > >>>>>>>>>> - >>> > >>>>>>>>>> >>> > >>> >>> > >>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip >>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip) >>> > >>>>>>>>>> >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> ❯ unzip -l >>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | >>> > >>>>>>>> grep >>> > >>>>>>>>>> _vendor >>> > >>>>>>>>>> 50 02-02-2020 00:00 airflow/sdk/_vendor/.gitignore >>> > >>>>>>>>>> 2082 02-02-2020 00:00 airflow/sdk/_vendor/__init__.py >>> > >>>>>>>>>> 28 02-02-2020 00:00 >>> > airflow/sdk/_vendor/airflow_common.pyi >>> > >>>>>>>>>> 18 02-02-2020 00:00 airflow/sdk/_vendor/vendor.txt >>> > >>>>>>>>>> 785 02-02-2020 00:00 >>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py >>> > >>>>>>>>>> 10628 02-02-2020 00:00 >>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> >>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete >>> too: >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep >>> > _vendor >>> > >>>>>>>>>> >>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore >>> > >>>>>>>>>> >>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py >>> > >>>>>>>>>> >>> > >>>> >>> > >>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi >>> > >>>>>>>>>> >>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>> >>> > >>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py >>> > >>> >>> > >>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> >>> > >>>>>>>>>> The plugin works at build time by including/copying the libs >>> > >>>> specified >>> > >>>>>>>> in >>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of >>> import >>> > >>>>>>>> rewrites.) >>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from a >>> repo >>> > >>>> checkout, >>> > >>>>>>>> I >>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and since >>> it’s at >>> > >>> the >>> > >>>>>>>> end of >>> > >>>>>>>>>> the list it will only be used if the normal import can’t >>> find >>> > >>>> things. >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>> >>> > >>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py >>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect “import >>> > >>>> rewriting” >>> > >>>>>>>>>> affect, as in this approach `airflow_common` is directly >>> loaded >>> > >>>> (i.e. >>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist >>> in >>> > >>>>>>>>>> sys.modules), but it does work for everything that I was >>> able to >>> > >>>> test.. >>> > >>>>>>>>>> I tested it with the diff at the end of this message. My >>> test >>> > >>>> ipython >>> > >>>>>>>>>> shell: >>> > >>>>>>>>>> >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone >>> import >>> > >>> foo >>> > >>>>>>>>>> In [2]: foo >>> > >>>>>>>>>> Out[2]: 1 >>> > >>>>>>>>>> >>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common >>> > >>>>>>>>>> >>> > >>>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone >>> > >>>>>>>>>> >>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ >>> > >>>>>>>>>> Out[5]: >>> > >>>>>>>>>> >>> > >>> >>> > >>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' >>> > >>>>>>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__ >>> > >>>>>>>>>> Out[6]: >>> > >>>>>>>>>> >>> > >>> >>> > >>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> And in an standalone environment with the SDK dist I built >>> (it >>> > >>>> needed >>> > >>>>>>>> the >>> > >>>>>>>>>> matching airflow-core right now, but that is nothing to do >>> with >>> > >>> this >>> > >>>>>>>>>> discussion): >>> > >>>>>>>>>> >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with >>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with >>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython >>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] >>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more >>> information >>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' >>> for >>> > >>> help. >>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the options >>> with >>> > >>>>>>>> `%history?` >>> > >>>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone >>> > >>>>>>>>>> >>> > >>>>>>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__ >>> > >>>>>>>>>> Out[2]: >>> > >>>>>>>>>> >>> > >>> >>> > >>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ >>> > >>>>>>>>>> `` >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> ```diff >>> > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py >>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py >>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644 >>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py >>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py >>> > >>>>>>>>>> @@ -14,3 +14,5 @@ >>> > >>>>>>>>>> # KIND, either express or implied. See the License for the >>> > >>>>>>>>>> # specific language governing permissions and limitations >>> > >>>>>>>>>> # under the License. >>> > >>>>>>>>>> + >>> > >>>>>>>>>> +foo = 1 >>> > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py >>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py >>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644 >>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py >>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py >>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = >>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 >>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 >>> > >>>>>>>>>> utc = pendulum.UTC >>> > >>>>>>>>>> >>> > >>>>>>>>>> + >>> > >>>>>>>>>> +from airflow_common import foo >>> > >>>>>>>>>> + >>> > >>>>>>>>>> TIMEZONE: Timezone >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> ``` >>> > >>>>>>>>>> >>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> >>> > wrote: >>> > >>>>>>>>>>> I think both approaches are doable: >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by pre-commit >>> when >>> > >>>>>>>> importing >>> > >>>>>>>>>>> from different distributions and make sure we are only >>> doing >>> > >>>> relative >>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of this >>> > >>>> already. And >>> > >>>>>>>>>> yes >>> > >>>>>>>>>>> it would require relative links we currently do not allow. >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point in >>> time >>> > >>> will >>> > >>>> have >>> > >>>>>>>>>> to >>> > >>>>>>>>>>> decide to synchronize this and if it happens just before >>> > release >>> > >>>> (I bet >>> > >>>>>>>>>>> this is going to happen) this will lead to solving problems >>> > that >>> > >>>> would >>> > >>>>>>>>>>> normally be solved during PR when you make a change (i.e. >>> > >>> symbolic >>> > >>>> link >>> > >>>>>>>>>> has >>> > >>>>>>>>>>> the advantage that whoever modifies shared code will be >>> > >>> immediately >>> > >>>>>>>>>>> notified in their PR - that they broke something because >>> either >>> > >>>> static >>> > >>>>>>>>>>> checks or mypy or tests fail. >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when) does >>> the >>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could solve >>> it >>> > if >>> > >>>> it is >>> > >>>>>>>>>> done >>> > >>>>>>>>>>> more frequently and with some regularity? We could >>> potentially >>> > >>>> force >>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code >>> changes >>> > (and >>> > >>>>>>>> prevent >>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place (other >>> than >>> > >>>> releases) >>> > >>>>>>>>>> in >>> > >>>>>>>>>>> our development workflow and that seems to be a bit too >>> late as >>> > >>>> puts an >>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities >>> introduced >>> > on >>> > >>>>>>>> release >>> > >>>>>>>>>>> manager and delays the release. WDYT? >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> Re: relative links. I think for a shared library we could >>> > >>>> potentially >>> > >>>>>>>>>> relax >>> > >>>>>>>>>>> this and allow them (and actually disallow absolute links >>> in >>> > the >>> > >>>> pieces >>> > >>>>>>>>>> of >>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I recall, >>> the >>> > >>> only >>> > >>>>>>>>>> reason >>> > >>>>>>>>>>> we forbade the relative links is because of how we are (or >>> > maybe >>> > >>>> were) >>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So we >>> decided >>> > >>> to >>> > >>>> just >>> > >>>>>>>>>> not >>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing >>> works is >>> > >>> that >>> > >>>>>>>> when >>> > >>>>>>>>>>> you are using importlib to read the Dag from a file, the >>> > relative >>> > >>>>>>>> imports >>> > >>>>>>>>>>> do not work as it does not know what they should be >>> relative >>> > to. >>> > >>>> But if >>> > >>>>>>>>>>> relative import is done from an imported package, it >>> should be >>> > no >>> > >>>>>>>>>> problem, >>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to import >>> any >>> > >>>> library >>> > >>>>>>>> that >>> > >>>>>>>>>>> uses relative imports. >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> Of course consistency might be the reason why we do not >>> want to >>> > >>>>>>>> introduce >>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is >>> guarded >>> > by >>> > >>>>>>>>>> pre-commit >>> > >>>>>>>>>>> though. >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> J. >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> J. >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < >>> > >>>> a...@apache.org> >>> > >>>>>>>>>>> napisał: >>> > >>>>>>>>>>> >>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It does >>> mean >>> > >>> you >>> > >>>>>>>> can’t >>> > >>>>>>>>>>>> use absolute imports in one common module to another. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> For example >>> > >>>>>>>>>>>> >>> > >>> >>> > >>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 >>> > >>>>>>>>>>>> where we have >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> ``` >>> > >>>>>>>>>>>> from airflow.utils.module_loading import import_string >>> > >>>>>>>>>>>> ``` >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib that is >>> > then >>> > >>>>>>>>>> symlinked >>> > >>>>>>>>>>>> then we wouldn’t be able to have `from >>> > >>>> airflow_common.module_loading >>> > >>>>>>>>>> import >>> > >>>>>>>>>>>> import_string`. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> I can think of two possible solutions here. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this shared >>> lib, so >>> > >>>> `from >>> > >>>>>>>>>>>> .module_loading import import_string` >>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) >>> which >>> > >>> will >>> > >>>>>>>> handle >>> > >>>>>>>>>>>> import-rewriting for us. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing, >>> so I >>> > >>>> prepared >>> > >>>>>>>> a >>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could look >>> like >>> > here >>> > >>>>>>>>>>>> >>> > >>> >>> > >>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 >>> > >>>>>>>>>>>> Now personally I am more than happy with relative >>> imports, but >>> > >>>>>>>> generally >>> > >>>>>>>>>>>> as a project we have avoided them, so I think that limits >>> what >>> > >>> we >>> > >>>>>>>> could >>> > >>>>>>>>>> do >>> > >>>>>>>>>>>> with a symlink based approach. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> -ash >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < >>> > >>>>>>>> gopidesupa...@gmail.com> >>> > >>>>>>>>>>>> wrote: >>> > >>>>>>>>>>>>> Thanks Ash >>> > >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making >>> sure we >>> > >>>> have all >>> > >>>>>>>>>> the >>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code >>> sharing >>> > >>> and >>> > >>>>>>>>>>>> executing >>> > >>>>>>>>>>>>> right of tests between the packages. >>> > >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> Agree with others, option 2 would be >>> > >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < >>> > >>>>>>>> amoghdesai....@gmail.com> >>> > >>>>>>>>>>>>> wrote: >>> > >>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash. >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to >>> handle >>> > the >>> > >>>> code >>> > >>>>>>>>>>>>>> duplication at *release* time. >>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in >>> itself. >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we >>> decide >>> > >>> to >>> > >>>>>>>>>> separate >>> > >>>>>>>>>>>>>> triggerer / >>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate packages, >>> back >>> > >>>> compat is >>> > >>>>>>>>>>>> going >>> > >>>>>>>>>>>>>> to be a nightmare >>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate. >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> Thanks & Regards, >>> > >>>>>>>>>>>>>> Amogh Desai >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < >>> > >>> kaxiln...@gmail.com> >>> > >>>>>>>>>> wrote: >>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of >>> dependencies >>> > >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler >>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid >>> > >>>>>>>>>>>>>>> wrote: >>> > >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is >>> > >>> rather >>> > >>>>>>>>>>>> pragmatic >>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and have >>> less >>> > >>>> package >>> > >>>>>>>>>> counts >>> > >>>>>>>>>>>>>>>> and dependencies. >>> > >>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together with >>> > >>> Jarek, >>> > >>>> I am >>> > >>>>>>>>>> not >>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be >>> > possible. >>> > >>>> To >>> > >>>>>>>> keep >>> > >>>>>>>>>>>>>> the >>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into another. >>> If >>> > then >>> > >>>> at >>> > >>>>>>>>>> point >>> > >>>>>>>>>>>>>> of >>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we have >>> 1 set >>> > >>> of >>> > >>>>>>>> code. >>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy could >>> be >>> > >>>> solved by >>> > >>>>>>>> a >>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are >>> de-duplicated >>> > >>>> anyway >>> > >>>>>>>>>> based >>> > >>>>>>>>>>>>>> on >>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt. >>> > >>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: >>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - the >>> > >>>> reasonings >>> > >>>>>>>>>>>>>> you've >>> > >>>>>>>>>>>>>>>>> shown for that make sense to me. >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> Shahar >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor < >>> > >>>> a...@apache.org >>> > >>>>>>>>>>>>>>> wrote: >>> > >>>>>>>>>>>>>>>>>> Hello everyone, >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level >>> separation of >>> > >>>> Task >>> > >>>>>>>> SDK >>> > >>>>>>>>>>>>>> and >>> > >>>>>>>>>>>>>>>> Core >>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some situations >>> > where >>> > >>> we >>> > >>>>>>>> would >>> > >>>>>>>>>>>>>>> like >>> > >>>>>>>>>>>>>>>> to >>> > >>>>>>>>>>>>>>>>>> share code between these. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put >>> it in >>> > a >>> > >>>> common >>> > >>>>>>>>>>>>>> dist >>> > >>>>>>>>>>>>>>>> they >>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the >>> Task >>> > SDK >>> > >>>>>>>>>>>>>> separation >>> > >>>>>>>>>>>>>>>> was >>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence between >>> the >>> > >>> two, >>> > >>>>>>>>>> ideally >>> > >>>>>>>>>>>>>>>> even if >>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. Most >>> of the >>> > >>>> reason >>> > >>>>>>>>>> why >>> > >>>>>>>>>>>>>>> this >>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards >>> > >>>> compatibility - >>> > >>>>>>>> if >>> > >>>>>>>>>> we >>> > >>>>>>>>>>>>>>>> make >>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in >>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but >>> > that >>> > >>>> covers >>> > >>>>>>>>>>>>>> some >>> > >>>>>>>>>>>>>>>> more >>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this >>> discussion >>> > >>>> such as >>> > >>>>>>>>>>>>>>>> possibly >>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a single >>> > >>> provider >>> > >>>>>>>> dist.) >>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would >>> like to >>> > >>>> share >>> > >>> >>> > >>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to >>> share >>> > will >>> > >>>> be >>> > >>>>>>>> the >>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration >>> > (but >>> > >>>>>>>> notably: >>> > >>>>>>>>>>>>>>> only >>> > >>>>>>>>>>>>>>>> the >>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, >>> again, >>> > lets >>> > >>>> not >>> > >>>>>>>>>> dwell >>> > >>>>>>>>>>>>>>> on >>> > >>>>>>>>>>>>>>>> the >>> > >>>>>>>>>>>>>>>>>> specifics of that) >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue here for >>> > >>>> discussion, >>> > >>>>>>>>>>>>>>> broadly >>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches: >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution >>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the >>> > code >>> > >>>> in one >>> > >>>>>>>>>>>>>>> place. >>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case of >>> > Logging >>> > >>>> config >>> > >>>>>>>>>> or >>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards >>> > compatibility >>> > >>> is >>> > >>>>>>>> much >>> > >>>>>>>>>>>>>> much >>> > >>>>>>>>>>>>>>>>>> harder. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is >>> > >>> released >>> > >>>>>>>>>>>>>>>> with/embedded >>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would >>> contain >>> > >>> the >>> > >>>>>>>> right >>> > >>>>>>>>>>>>>>>> version >>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The >>> > downside >>> > >>> is >>> > >>>>>>>> that >>> > >>>>>>>>>>>>>>> either >>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo, or >>> > better >>> > >>>> yet >>> > >>>>>>>> it >>> > >>>>>>>>>>>>>>> would >>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some tooling >>> > (TBD) >>> > >>>> will >>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at >>> commit >>> > >>>> time, or >>> > >>>>>>>> my >>> > >>>>>>>>>>>>>>>>>> preference, at release time. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very >>> > strongly >>> > >>>>>>>> leaning >>> > >>>>>>>>>>>>>>>> towards >>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the >>> > >>> backwards >>> > >>>>>>>>>>>>>>>> compatibility >>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very quickly >>> over >>> > >>>> time >>> > >>>>>>>> the >>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be >>> > >>>> unreasonable) >>> > >>>>>>>>>> and I >>> > >>>>>>>>>>>>>>>> don’t >>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as we >>> need >>> > to >>> > >>>> really >>> > >>>>>>>>>>>>>>> deliver >>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to >>> delivery >>> > >>> with >>> > >>>>>>>>>> AIP-72. >>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about this, I >>> will >>> > >>>> come up >>> > >>>>>>>>>>>>>> with >>> > >>>>>>>>>>>>>>> a >>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will implement >>> code >>> > >>>> sharing >>> > >>>>>>>>>> via >>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how >>> I >>> > can >>> > >>>>>>>> achieve >>> > >>>>>>>>>>>>>> this >>> > >>>>>>>>>>>>>>>> so >>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll work >>> > there, >>> > >>>> but >>> > >>>>>>>> at >>> > >>>>>>>>>>>>>>>> runtime >>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at >>> > >>> something >>> > >>>>>>>> like >>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we would >>> end >>> > up >>> > >>>> with: >>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml >>> > >>>>>>>>>>>>>>>>>> airflow-core/src/ >>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/ >>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml >>> > >>>>>>>>>>>>>>>>>> task-sdk/src/ >>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/ >>> > >>>>>>>>>>>>>>>>>> airflow-common/src >>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/ >>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps >>> > would >>> > >>>> be >>> > >>>>>>>>>>>>>> included >>> > >>>>>>>>>>>>>>>> in >>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. >>> > >>>> >>> --------------------------------------------------------------------- >>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail: >>> > dev-unsubscr...@airflow.apache.org >>> > >>>>>>>>>>>>>>>> For additional commands, e-mail: >>> > >>> dev-h...@airflow.apache.org >>> > >>>>>>>>>>>>>>>> >>> > >>>>>>>> >>> > >>> >>> --------------------------------------------------------------------- >>> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> > >>>>>>>> >>> > >>>>>>>> >>> > >>>>>> >>> > --------------------------------------------------------------------- >>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> > >>>>>> >>> > >>>>> >>> --------------------------------------------------------------------- >>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> > >>>>> >>> > >>>> >>> --------------------------------------------------------------------- >>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> > >>>> >>> > >>>> >>> > > >>> > > --------------------------------------------------------------------- >>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> > > For additional commands, e-mail: dev-h...@airflow.apache.org >>> > > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> > For additional commands, e-mail: dev-h...@airflow.apache.org >>> > >>> > >>> >>