Yeah. when we get the final PR I will also want to to test more scenarios -
with IDE/mypy integration switching branches, uv syncing etc. and will be
happy to help and document the contributor's doc to explain what and how to
work with it. This would be a super cool thing if we get it to
work seamlessly for everyone :)

On Thu, Jul 10, 2025 at 7:44 AM Amogh Desai <amoghdesai....@gmail.com>
wrote:

> Agreed!
>
> Once the PR is up, we can have these implementation level discussions
> over there. Good chat however!
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Wed, Jul 9, 2025 at 3:56 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Yeah. I think extracting one-by-one, feature-by-feature that we want to
>> share to a separate distribution is the best approach - it will actually
>> also help with the "__init__.py" cleanup - because almost by definition -
>> those distributions will not be able to "reach" outside - i.e. they only
>> can be "used" not "use" something else. Which means that - for example as
>> it is now - configuration using logging which is using configuration
>> (leading to circular dependencies and partially initialized modules) will
>> not happen and we will have to figure out rather how to inject
>> configuration into logging (from task-sdk, airflow-ctl, airflow-core) and
>> to get the right sequence of initialization - rather than have the
>> inter-feature-dependencies.
>>
>> And that is precisely what low-cyclomatic complexity is about as well -
>> it generally leads to easier-to-maintain software that has well defined
>> functionality and does not have those weird circular dependencies we have
>> now. That's a kind-of side-effect of such a "per feature" split, but a very
>> desirable one.
>>
>> J.
>>
>> On Wed, Jul 9, 2025 at 10:17 AM Amogh Desai <amoghdesai....@gmail.com>
>> wrote:
>>
>>> Probably, you make a valid point.
>>>
>>> Maybe this is an implementation detail, so we could figure it out as we
>>> start on a POC and factor in these things
>>> as we move along?
>>>
>>> But from an initial guess, I would think that execution time related
>>> items (if we manage to enumerate them) would be something
>>> that would be better off in that "core_and_task_sdk" bundle.
>>>
>>> Thanks & Regards,
>>> Amogh Desai
>>>
>>>
>>> On Tue, Jul 8, 2025 at 3:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> > Not that I am against your idea and we can surely expand as we need
>>>> but we would not need to expand the
>>>> "core_and_task_sdk" if we put only the relevant items into it.
>>>>
>>>> So if we move logging and config out, my question is what is really
>>>> relevant to "stay" in "core_and_task_sdk" ? And what we know will not be
>>>> needed by other distributions in the future ?
>>>>
>>>> What would be the content of "core_and_task_sdk" ? Can we enumerate
>>>> what should go there now ?
>>>>
>>>> My bet is that if we enumerate them, it will turn out that they are
>>>> good candidates to make separate "shared features" that can be logically
>>>> named and modularised, and that there is no real need to have shared
>>>> "core_and_task_sdk" which sounds like "bag of everything else".
>>>>
>>>>
>>>>
>>>> On Tue, Jul 8, 2025 at 11:48 AM Amogh Desai <amoghdesai....@gmail.com>
>>>> wrote:
>>>>
>>>>> Yeah, I think what you are showcasing here is a step ahead of the
>>>>> initial proposal from Ash.
>>>>>
>>>>> From the original proposal, the `core_and_task_sdk` *can* have the
>>>>> things relevant to just those two
>>>>> distros. Logging, Config are modules that might be needed by
>>>>> airflow-ctl for example, so ideally, those
>>>>> would not be good candidates to be put in there, ideally speaking.
>>>>>
>>>>> The example of Kubernetes Utils sounds to be a good example, it will
>>>>> be used by KubeExecutor (lets say this is a
>>>>> module called "executors") and by KPO (providers), and the
>>>>> "shared/kubernetes" would probably be a good
>>>>> candidate for that.
>>>>>
>>>>> Not that I am against your idea and we can surely expand as we need
>>>>> but we would not need to expand the
>>>>> "core_and_task_sdk" if we put only the relevant items into it.
>>>>>
>>>>> Thanks & Regards,
>>>>> Amogh Desai
>>>>>
>>>>>
>>>>> On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>
>>>>>> > @Jarek Potiuk <ja...@potiuk.com> a little confused on what you
>>>>>> mean there, I am understanding the direction
>>>>>> but could you elaborate a bit more please?
>>>>>>
>>>>>> Let me elaborate:
>>>>>>
>>>>>> As I understand (maybe I am wrong?),  the proposal is that we have a
>>>>>> "core-and-task-sdk" folder which is a shared distribution that is
>>>>>> vendored-in into both "airflow-core" and "airflow-task-sdk". This
>>>>>> contains some shared code that we want to include in both distributions
>>>>>> (note that we never ever release "core-and-task-sdk"
>>>>>> distribution because it only contains code that is shared between the two
>>>>>> distributions we release.
>>>>>>
>>>>>> That's fine and cool.
>>>>>>
>>>>>> Imagine that this distribution contains "logging" (shared code
>>>>>> for logging) and "config" (shared code for configuration). - both needed 
>>>>>> in
>>>>>> "airflow-core" and "airflow-task-sdk". So far so good. But what happen if
>>>>>> we want to use logging in the same fashion in say "airflow-ctl" (that 
>>>>>> also
>>>>>> applies for other distributions that we might come up with) ? Are we 
>>>>>> going
>>>>>> to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" 
>>>>>> ?
>>>>>> It would be far better if we just vendor in "logging" and do not 
>>>>>> vendor-in
>>>>>> "config".
>>>>>>
>>>>>> And if we are going to have a mechanism to vendor-in "a distribution"
>>>>>> - there is nothing wrong with having the same way to vendor-in multiple
>>>>>> distributions - so we can easily do it this way (i added "orm" ,
>>>>>> "serialization". and "fast_api" as an example thing that we might want to
>>>>>> share - not sure if that is really something we want to do but it will
>>>>>> allow to illustrate my idea better)
>>>>>>
>>>>>> /
>>>>>>   airflow-ctl/
>>>>>>   task-sdk/...
>>>>>>   airflow-core/...
>>>>>>   ....
>>>>>>   shared/
>>>>>>     kubernetes/
>>>>>>       pyproject.toml
>>>>>>       src/
>>>>>>         airflow_shared_kubernetes/__init__.py
>>>>>>     logging/
>>>>>>       pyproject.toml
>>>>>>       src/
>>>>>>         airflow_shared_logging/__init__.py
>>>>>>     config/
>>>>>>       pyproject.toml
>>>>>>       src/
>>>>>>         airflow_shared_config/__init__.py
>>>>>>     orm/
>>>>>>       pyproject.toml
>>>>>>       src/
>>>>>>         airflow_shared_orm/__init__.py
>>>>>>      serialization/
>>>>>>       pyproject.toml
>>>>>>       src/
>>>>>>         airflow_shared_serialization/__init__.py
>>>>>>      fast_api/
>>>>>>       pyproject.toml
>>>>>>       src/
>>>>>>         airflow_shared_fast_api/__init__.py
>>>>>>     ...
>>>>>>
>>>>>> This has multiple benefits (and I see no real drawbacks):
>>>>>>
>>>>>> * the code can be really well modularised.  Those "things" we share
>>>>>> (and this also connects to the __init__.py discussion) - can be 
>>>>>> independent
>>>>>> - and (it follow Jens comment) it allows to keep low cyclomatic 
>>>>>> complexity
>>>>>> https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way
>>>>>> easier to implement logging in the way that it does not import or use
>>>>>> config. This means for example that configuration for logging will need 
>>>>>> to
>>>>>> be injected when logging is initialized - and that's exactly what we 
>>>>>> want,
>>>>>> we do not want logging code to use configuration code directly - they
>>>>>> should be independent from each other and you should be free to vendor-in
>>>>>> either logging or config independently if you also vendored-in the other.
>>>>>>
>>>>>> * it's much more logical. We split based on functionality we want to
>>>>>> share - not about the "distributions" we want to produce. That allows us 
>>>>>> -
>>>>>> in the future - to make different decisions on how we split our
>>>>>> distributions. For example (i do not tell we have to do it, or that we 
>>>>>> will
>>>>>> do it but we will have such a possibility) - we can add more shared
>>>>>> utilities we find useful in the same way and decide that "scheduler",
>>>>>> "api_server" or "scheduler" or "triggerer" or "dag processor" are split 
>>>>>> to
>>>>>> separate distributions - because for example we want to keep a number of
>>>>>> dependencies down. And for example "api_server" might use "fast_api",
>>>>>> "config", "logging", "orm" and "fast_api" and "serialization" , where the
>>>>>> scheduler should not need "fast_api". The "dag_processor" eventually 
>>>>>> might
>>>>>> not need "orm" nor "fast_api" and only use the other
>>>>>>
>>>>>> This seems like a natural approach. If we have a mechanism to "share"
>>>>>> the code, it does not add complexity, but allows us to isolate 
>>>>>> independent
>>>>>> functionality into "isolated" boxes and use them
>>>>>>
>>>>>> Also for cyclomatic complexity that is a complex word (badly chosen
>>>>>> as it scares people away) and has some math behind, but it really boils
>>>>>> down to very simple "rules of thumb" (and yes I am a big proponent of
>>>>>> having low cyclomatic complexity).
>>>>>>
>>>>>> a) when you are building the "final" product (i.e. distribution you
>>>>>> want to release) - make sure that you only "use" things - that nothing 
>>>>>> else
>>>>>> is "using" you as a library.
>>>>>> b) when you are building a "shared" thing (a library) - make sure
>>>>>> that library is only "used" by others but it does not really "use" 
>>>>>> anything
>>>>>> else. For example in the case I explained above - we can achieve
>>>>>> low-cyclomatic complexity when:
>>>>>>
>>>>>> * airflow-core uses: logging, config, orm, serialization, fast_api
>>>>>> * none of the "logging, config, orm, serialization, fast_api" use
>>>>>> each other - they are at the bottom of the "user -> used" tree
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I like the folder structure proposed by Ash and have no objections
>>>>>>> with it.
>>>>>>>
>>>>>>> "core_and_task_sdk" sounds good to me and justifies what it should
>>>>>>> do pretty well.
>>>>>>>
>>>>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
>>>>>>> there, I am understanding the direction
>>>>>>> but could you elaborate a bit more please?
>>>>>>>
>>>>>>> Naming is REALLY hard!
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Amogh Desai
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How about splitting it even more and having each shared "thing"
>>>>>>>> named?
>>>>>>>> "logging", "config" and sharing them explicitly and separately with
>>>>>>>> the
>>>>>>>> right "user" ?
>>>>>>>> That sounds way more modular and  we will be able to choose which
>>>>>>>> of the
>>>>>>>> shared "utils" we use where.
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler
>>>>>>>> <j_scheff...@gmx.de.invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > I like "core_and_task_sdk the same like core-and-task-sdk - I
>>>>>>>> have no
>>>>>>>> > problem and it is a path only.
>>>>>>>> >
>>>>>>>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which
>>>>>>>> is a
>>>>>>>> > bit bulky we then should name it "all-not-api-server" :-D
>>>>>>>> >
>>>>>>>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote:
>>>>>>>> > > In case I did a bad job explaining it, the “core and task sdk”
>>>>>>>> is not in
>>>>>>>> > the module name/import name, just in the file path.
>>>>>>>> > >
>>>>>>>> > > Anyone have other ideas?
>>>>>>>> > >
>>>>>>>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > >>
>>>>>>>> > >> Thanks Ash! Looks cool! I like the structure. This will
>>>>>>>> enable all the
>>>>>>>> > >> combinations and structure looks easy to grasp. No strong
>>>>>>>> stance on the
>>>>>>>> > >> naming other than maybe it is a bit long with `and`,
>>>>>>>> `core_ctl` could be
>>>>>>>> > >> shorter, since no import path is defined like that, we can
>>>>>>>> give any name
>>>>>>>> > >> for sure.
>>>>>>>> > >>
>>>>>>>> > >> Best regards,
>>>>>>>> > >>
>>>>>>>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>> > >>>
>>>>>>>> > >>> Looks good but I think we should find some better logical
>>>>>>>> name for
>>>>>>>> > >>> core_and_sdk :)
>>>>>>>> > >>>
>>>>>>>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
>>>>>>>> > >>> <j_scheff...@gmx.de.invalid> napisał:
>>>>>>>> > >>>
>>>>>>>> > >>>> Cool! Especially the "shared" folder with the ability to have
>>>>>>>> > >>>> N-combinations w/o exploding project repo root!
>>>>>>>> > >>>>
>>>>>>>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote:
>>>>>>>> > >>>>> Oh, and all of this will be explain in shared/README.md
>>>>>>>> > >>>>>
>>>>>>>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org>
>>>>>>>> wrote:
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Okay, so it seems we have agreement on the approach here,
>>>>>>>> so I’ll
>>>>>>>> > >>>> continue with this, and on the dev call it was mentioned that
>>>>>>>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal
>>>>>>>> for the
>>>>>>>> > file
>>>>>>>> > >>>> structure;
>>>>>>>> > >>>>>> ```
>>>>>>>> > >>>>>> /
>>>>>>>> > >>>>>>   task-sdk/...
>>>>>>>> > >>>>>>   airflow-core/...
>>>>>>>> > >>>>>>   shared/
>>>>>>>> > >>>>>>     kuberenetes/
>>>>>>>> > >>>>>>       pyproject.toml
>>>>>>>> > >>>>>>       src/
>>>>>>>> > >>>>>>         airflow_kube/__init__.py
>>>>>>>> > >>>>>>     core-and-tasksdk/
>>>>>>>> > >>>>>>       pyproject.toml
>>>>>>>> > >>>>>>       src/
>>>>>>>> > >>>>>>         airflow_shared/__init__.py
>>>>>>>> > >>>>>> ```
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Things to note here: the “shared” folder has (the
>>>>>>>> possibility) of
>>>>>>>> > >>>> having multiple different shared “libraries” in it, in this
>>>>>>>> example I
>>>>>>>> > am
>>>>>>>> > >>>> supposing a hypothetical shared kuberenetes folder a world
>>>>>>>> in which we
>>>>>>>> > >>>> split the KubePodOperator and the KubeExecutor in to two
>>>>>>>> separate
>>>>>>>> > >>>> distributions (example only, not proposing we do that right
>>>>>>>> now, and
>>>>>>>> > that
>>>>>>>> > >>>> will be a separate discussion)
>>>>>>>> > >>>>>> The other things to note here:
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> - the folder name in shared aims to be “self-documenting”,
>>>>>>>> hence the
>>>>>>>> > >>>> verbose “core-and-tasksdk” to say where the shared library is
>>>>>>>> > intended to
>>>>>>>> > >>>> be used.
>>>>>>>> > >>>>>> - the python module itself should almost always have an
>>>>>>>> `airflow_`
>>>>>>>> > (or
>>>>>>>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with
>>>>>>>> anything
>>>>>>>> > >>> else
>>>>>>>> > >>>> we might use. It won’t matter “in production” as those will
>>>>>>>> be
>>>>>>>> > vendored
>>>>>>>> > >>> in
>>>>>>>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but
>>>>>>>> avoiding
>>>>>>>> > >>>> conflicts at dev time with the Finder approach is a good
>>>>>>>> safety
>>>>>>>> > measure.
>>>>>>>> > >>>>>> I will start making a real PR for this proposal now, but
>>>>>>>> I’m open to
>>>>>>>> > >>>> feedback (either here, or in the PR when I open it)
>>>>>>>> > >>>>>> -ash
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>> > >>>>>>>
>>>>>>>> > >>>>>>> Yeah we have to try it and test - also building packages
>>>>>>>> happens
>>>>>>>> > semi
>>>>>>>> > >>>>>>> frequently when you run `uv sync` (they use some kind of
>>>>>>>> heuristics
>>>>>>>> > >>> to
>>>>>>>> > >>>>>>> decide when) and you can force it with `--reinstall` or
>>>>>>>> > `--refresh`.
>>>>>>>> > >>>>>>> Package build also happens every time when you run
>>>>>>>> "ci-image build`
>>>>>>>> > >>>> now in
>>>>>>>> > >>>>>>> breeze so it seems like it will nicely integrate in our
>>>>>>>> workflows.
>>>>>>>> > >>>>>>>
>>>>>>>> > >>>>>>> Looks really cool Ash.
>>>>>>>> > >>>>>>>
>>>>>>>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <
>>>>>>>> a...@apache.org>
>>>>>>>> > >>>> wrote:
>>>>>>>> > >>>>>>>> It’s not just release time, but any time we build a
>>>>>>>> package which
>>>>>>>> > >>>> happens
>>>>>>>> > >>>>>>>> on “every” CI run. The normal unit tests will use code
>>>>>>>> from
>>>>>>>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which
>>>>>>>> build an
>>>>>>>> > >>>> image will
>>>>>>>> > >>>>>>>> build the dists and vendor in the code from that commit.
>>>>>>>> > >>>>>>>>
>>>>>>>> > >>>>>>>> There is only a single copy of the shared code committed
>>>>>>>> to the
>>>>>>>> > >>> repo,
>>>>>>>> > >>>> so
>>>>>>>> > >>>>>>>> there is never anything to synchronise.
>>>>>>>> > >>>>>>>>
>>>>>>>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <
>>>>>>>> amoghdesai....@gmail.com>
>>>>>>>> > >>>> wrote:
>>>>>>>> > >>>>>>>>> Thanks Ash.
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>> This is really cool and helpful that you were able to
>>>>>>>> test both
>>>>>>>> > >>>> scenarios
>>>>>>>> > >>>>>>>>> -- repo checkout
>>>>>>>> > >>>>>>>>> and also installing from the vendored package and the
>>>>>>>> resolution
>>>>>>>> > >>>> worked
>>>>>>>> > >>>>>>>>> fine too.
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>> I like this idea compared the to relative import one
>>>>>>>> for few
>>>>>>>> > >>> reasons:
>>>>>>>> > >>>>>>>>> - It feels like it will take some time to adjust to the
>>>>>>>> new
>>>>>>>> > coding
>>>>>>>> > >>>>>>>> standard
>>>>>>>> > >>>>>>>>> that we will lay
>>>>>>>> > >>>>>>>>> if we impose relative imports in the shared dist
>>>>>>>> > >>>>>>>>> - We can continue using repo wise absolute import
>>>>>>>> standards, it
>>>>>>>> > is
>>>>>>>> > >>>> also
>>>>>>>> > >>>>>>>>> much easier for situations
>>>>>>>> > >>>>>>>>> when we do global search in IDE to find + replace, this
>>>>>>>> could
>>>>>>>> > mean
>>>>>>>> > >>> a
>>>>>>>> > >>>>>>>> change
>>>>>>>> > >>>>>>>>> there
>>>>>>>> > >>>>>>>>> - The vendoring work is a proven and established
>>>>>>>> paradigm across
>>>>>>>> > >>>> projects
>>>>>>>> > >>>>>>>>> and would
>>>>>>>> > >>>>>>>>> out of box give us the build tooling we need also
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>> Nothing too against the relative import but with the
>>>>>>>> evidence
>>>>>>>> > >>>> provided
>>>>>>>> > >>>>>>>>> above, vendored approach
>>>>>>>> > >>>>>>>>> seems to only do us good.
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>> Regarding synchronizing it, release time should be fine
>>>>>>>> as long
>>>>>>>> > as
>>>>>>>> > >>> we
>>>>>>>> > >>>>>>>> have
>>>>>>>> > >>>>>>>>> a good CI workflow to probably
>>>>>>>> > >>>>>>>>> catch such issues per PR if changes are made in shared
>>>>>>>> dist?
>>>>>>>> > >>>> (precommit
>>>>>>>> > >>>>>>>>> would make it really slow i guess)
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>> If we can run our tests with vendored code we should be
>>>>>>>> mostly
>>>>>>>> > >>>> covered.
>>>>>>>> > >>>>>>>>> Good effort all!
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>> Thanks & Regards,
>>>>>>>> > >>>>>>>>> Amogh Desai
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>>
>>>>>>>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <
>>>>>>>> > a...@apache.org>
>>>>>>>> > >>>>>>>> wrote:
>>>>>>>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m
>>>>>>>> happy with.
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
>>>>>>>> > >>>>>>>>>> This produces the following from `uv build task-sdk`
>>>>>>>> > >>>>>>>>>> -
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
>>>>>>>> > >>>>>>>>>> -
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
>>>>>>>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will
>>>>>>>> .zip)
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>> ❯ unzip -l
>>>>>>>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
>>>>>>>> > >>>>>>>> grep
>>>>>>>> > >>>>>>>>>> _vendor
>>>>>>>> > >>>>>>>>>>      50  02-02-2020 00:00
>>>>>>>>  airflow/sdk/_vendor/.gitignore
>>>>>>>> > >>>>>>>>>>    2082  02-02-2020 00:00
>>>>>>>>  airflow/sdk/_vendor/__init__.py
>>>>>>>> > >>>>>>>>>>      28  02-02-2020 00:00
>>>>>>>> >  airflow/sdk/_vendor/airflow_common.pyi
>>>>>>>> > >>>>>>>>>>      18  02-02-2020 00:00
>>>>>>>>  airflow/sdk/_vendor/vendor.txt
>>>>>>>> > >>>>>>>>>>     785  02-02-2020 00:00
>>>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
>>>>>>>> > >>>>>>>>>>   10628  02-02-2020 00:00
>>>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is
>>>>>>>> complete too:
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz
>>>>>>>> |grep
>>>>>>>> > _vendor
>>>>>>>> > >>>>>>>>>>
>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>
>>>>>>>> >
>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
>>>>>>>> > >>>>>>>>>>
>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> The plugin works at build time by including/copying
>>>>>>>> the libs
>>>>>>>> > >>>> specified
>>>>>>>> > >>>>>>>> in
>>>>>>>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care
>>>>>>>> of import
>>>>>>>> > >>>>>>>> rewrites.)
>>>>>>>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from
>>>>>>>> a repo
>>>>>>>> > >>>> checkout,
>>>>>>>> > >>>>>>>> I
>>>>>>>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and
>>>>>>>> since it’s at
>>>>>>>> > >>> the
>>>>>>>> > >>>>>>>> end of
>>>>>>>> > >>>>>>>>>> the list it will only be used if the normal import
>>>>>>>> can’t find
>>>>>>>> > >>>> things.
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
>>>>>>>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect
>>>>>>>> “import
>>>>>>>> > >>>> rewriting”
>>>>>>>> > >>>>>>>>>> affect, as in this approach `airflow_common` is
>>>>>>>> directly loaded
>>>>>>>> > >>>> (i.e.
>>>>>>>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common
>>>>>>>> exist in
>>>>>>>> > >>>>>>>>>> sys.modules), but it does work for everything that I
>>>>>>>> was able to
>>>>>>>> > >>>> test..
>>>>>>>> > >>>>>>>>>> I tested it with the diff at the end of this message.
>>>>>>>> My test
>>>>>>>> > >>>> ipython
>>>>>>>> > >>>>>>>>>> shell:
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>> In [1]: from
>>>>>>>> airflow.sdk._vendor.airflow_common.timezone import
>>>>>>>> > >>> foo
>>>>>>>> > >>>>>>>>>> In [2]: foo
>>>>>>>> > >>>>>>>>>> Out[2]: 1
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> In [4]: import
>>>>>>>> airflow.sdk._vendor.airflow_common.timezone
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
>>>>>>>> > >>>>>>>>>> Out[5]:
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
>>>>>>>> > >>>>>>>>>> In [6]:
>>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>>>>> > >>>>>>>>>> Out[6]:
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> And in an standalone environment with the SDK dist I
>>>>>>>> built (it
>>>>>>>> > >>>> needed
>>>>>>>> > >>>>>>>> the
>>>>>>>> > >>>>>>>>>> matching airflow-core right now, but that is nothing
>>>>>>>> to do with
>>>>>>>> > >>> this
>>>>>>>> > >>>>>>>>>> discussion):
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
>>>>>>>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
>>>>>>>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl
>>>>>>>> ipython
>>>>>>>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang
>>>>>>>> 18.1.8 ]
>>>>>>>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more
>>>>>>>> information
>>>>>>>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type
>>>>>>>> '?' for
>>>>>>>> > >>> help.
>>>>>>>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the
>>>>>>>> options with
>>>>>>>> > >>>>>>>> `%history?`
>>>>>>>> > >>>>>>>>>> In [1]: import
>>>>>>>> airflow.sdk._vendor.airflow_common.timezone
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> In [2]:
>>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>>>>> > >>>>>>>>>> Out[2]:
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
>>>>>>>> > >>>>>>>>>> ``
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> ```diff
>>>>>>>> > >>>>>>>>>> diff --git
>>>>>>>> a/airflow-common/src/airflow_common/__init__.py
>>>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py
>>>>>>>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644
>>>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py
>>>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
>>>>>>>> > >>>>>>>>>> @@ -14,3 +14,5 @@
>>>>>>>> > >>>>>>>>>> # KIND, either express or implied.  See the License
>>>>>>>> for the
>>>>>>>> > >>>>>>>>>> # specific language governing permissions and
>>>>>>>> limitations
>>>>>>>> > >>>>>>>>>> # under the License.
>>>>>>>> > >>>>>>>>>> +
>>>>>>>> > >>>>>>>>>> +foo = 1
>>>>>>>> > >>>>>>>>>> diff --git
>>>>>>>> a/airflow-common/src/airflow_common/timezone.py
>>>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py
>>>>>>>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644
>>>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py
>>>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
>>>>>>>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
>>>>>>>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3
>>>>>>>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
>>>>>>>> > >>>>>>>>>> utc = pendulum.UTC
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> +
>>>>>>>> > >>>>>>>>>> +from airflow_common import foo
>>>>>>>> > >>>>>>>>>> +
>>>>>>>> > >>>>>>>>>> TIMEZONE: Timezone
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <
>>>>>>>> ja...@potiuk.com>
>>>>>>>> > wrote:
>>>>>>>> > >>>>>>>>>>> I think both approaches are doable:
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by
>>>>>>>> pre-commit when
>>>>>>>> > >>>>>>>> importing
>>>>>>>> > >>>>>>>>>>> from different distributions and make sure we are
>>>>>>>> only doing
>>>>>>>> > >>>> relative
>>>>>>>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of
>>>>>>>> this
>>>>>>>> > >>>> already. And
>>>>>>>> > >>>>>>>>>> yes
>>>>>>>> > >>>>>>>>>>> it would require relative links we currently do not
>>>>>>>> allow.
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point
>>>>>>>> in time
>>>>>>>> > >>> will
>>>>>>>> > >>>> have
>>>>>>>> > >>>>>>>>>> to
>>>>>>>> > >>>>>>>>>>> decide to synchronize this and if it happens just
>>>>>>>> before
>>>>>>>> > release
>>>>>>>> > >>>> (I bet
>>>>>>>> > >>>>>>>>>>> this is going to happen) this will lead to solving
>>>>>>>> problems
>>>>>>>> > that
>>>>>>>> > >>>> would
>>>>>>>> > >>>>>>>>>>> normally be solved during PR when you make a change
>>>>>>>> (i.e.
>>>>>>>> > >>> symbolic
>>>>>>>> > >>>> link
>>>>>>>> > >>>>>>>>>> has
>>>>>>>> > >>>>>>>>>>> the advantage that whoever modifies shared code will
>>>>>>>> be
>>>>>>>> > >>> immediately
>>>>>>>> > >>>>>>>>>>> notified in their PR - that they broke something
>>>>>>>> because either
>>>>>>>> > >>>> static
>>>>>>>> > >>>>>>>>>>> checks or mypy or tests fail.
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when)
>>>>>>>> does the
>>>>>>>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could
>>>>>>>> solve it
>>>>>>>> > if
>>>>>>>> > >>>> it is
>>>>>>>> > >>>>>>>>>> done
>>>>>>>> > >>>>>>>>>>> more frequently and with some regularity? We could
>>>>>>>> potentially
>>>>>>>> > >>>> force
>>>>>>>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code
>>>>>>>> changes
>>>>>>>> > (and
>>>>>>>> > >>>>>>>> prevent
>>>>>>>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place
>>>>>>>> (other than
>>>>>>>> > >>>> releases)
>>>>>>>> > >>>>>>>>>> in
>>>>>>>> > >>>>>>>>>>> our development workflow and that seems to be a bit
>>>>>>>> too late as
>>>>>>>> > >>>> puts an
>>>>>>>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities
>>>>>>>> introduced
>>>>>>>> > on
>>>>>>>> > >>>>>>>> release
>>>>>>>> > >>>>>>>>>>> manager and delays the release. WDYT?
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> Re: relative links. I think for a shared library we
>>>>>>>> could
>>>>>>>> > >>>> potentially
>>>>>>>> > >>>>>>>>>> relax
>>>>>>>> > >>>>>>>>>>> this and allow them (and actually disallow absolute
>>>>>>>> links in
>>>>>>>> > the
>>>>>>>> > >>>> pieces
>>>>>>>> > >>>>>>>>>> of
>>>>>>>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I
>>>>>>>> recall, the
>>>>>>>> > >>> only
>>>>>>>> > >>>>>>>>>> reason
>>>>>>>> > >>>>>>>>>>> we forbade the relative links is because of how we
>>>>>>>> are (or
>>>>>>>> > maybe
>>>>>>>> > >>>> were)
>>>>>>>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So
>>>>>>>> we decided
>>>>>>>> > >>> to
>>>>>>>> > >>>> just
>>>>>>>> > >>>>>>>>>> not
>>>>>>>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing
>>>>>>>> works is
>>>>>>>> > >>> that
>>>>>>>> > >>>>>>>> when
>>>>>>>> > >>>>>>>>>>> you are using importlib to read the Dag from a file,
>>>>>>>> the
>>>>>>>> > relative
>>>>>>>> > >>>>>>>> imports
>>>>>>>> > >>>>>>>>>>> do not work as it does not know what they should be
>>>>>>>> relative
>>>>>>>> > to.
>>>>>>>> > >>>> But if
>>>>>>>> > >>>>>>>>>>> relative import is done from an imported package, it
>>>>>>>> should be
>>>>>>>> > no
>>>>>>>> > >>>>>>>>>> problem,
>>>>>>>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to
>>>>>>>> import any
>>>>>>>> > >>>> library
>>>>>>>> > >>>>>>>> that
>>>>>>>> > >>>>>>>>>>> uses relative imports.
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> Of course consistency might be the reason why we do
>>>>>>>> not want to
>>>>>>>> > >>>>>>>> introduce
>>>>>>>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is
>>>>>>>> guarded
>>>>>>>> > by
>>>>>>>> > >>>>>>>>>> pre-commit
>>>>>>>> > >>>>>>>>>>> though.
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> J.
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> J.
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
>>>>>>>> > >>>> a...@apache.org>
>>>>>>>> > >>>>>>>>>>> napisał:
>>>>>>>> > >>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It
>>>>>>>> does mean
>>>>>>>> > >>> you
>>>>>>>> > >>>>>>>> can’t
>>>>>>>> > >>>>>>>>>>>> use absolute imports in one common module to another.
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> For example
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>>>>>>>> > >>>>>>>>>>>> where we have
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>>>> from airflow.utils.module_loading import
>>>>>>>> import_string
>>>>>>>> > >>>>>>>>>>>> ```
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib
>>>>>>>> that is
>>>>>>>> > then
>>>>>>>> > >>>>>>>>>> symlinked
>>>>>>>> > >>>>>>>>>>>> then we wouldn’t be able to have `from
>>>>>>>> > >>>> airflow_common.module_loading
>>>>>>>> > >>>>>>>>>> import
>>>>>>>> > >>>>>>>>>>>> import_string`.
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> I can think of two possible solutions here.
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this
>>>>>>>> shared lib, so
>>>>>>>> > >>>> `from
>>>>>>>> > >>>>>>>>>>>> .module_loading import import_string`
>>>>>>>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip
>>>>>>>> maintainers) which
>>>>>>>> > >>> will
>>>>>>>> > >>>>>>>> handle
>>>>>>>> > >>>>>>>>>>>> import-rewriting for us.
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a
>>>>>>>> thing, so I
>>>>>>>> > >>>> prepared
>>>>>>>> > >>>>>>>> a
>>>>>>>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could
>>>>>>>> look like
>>>>>>>> > here
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>>>>>>>> > >>>>>>>>>>>> Now personally I am more than happy with relative
>>>>>>>> imports, but
>>>>>>>> > >>>>>>>> generally
>>>>>>>> > >>>>>>>>>>>> as a project we have avoided them, so I think that
>>>>>>>> limits what
>>>>>>>> > >>> we
>>>>>>>> > >>>>>>>> could
>>>>>>>> > >>>>>>>>>> do
>>>>>>>> > >>>>>>>>>>>> with a symlink based approach.
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> -ash
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring
>>>>>>>> > >>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
>>>>>>>> > >>>>>>>> gopidesupa...@gmail.com>
>>>>>>>> > >>>>>>>>>>>> wrote:
>>>>>>>> > >>>>>>>>>>>>> Thanks Ash
>>>>>>>> > >>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me.
>>>>>>>> Making sure we
>>>>>>>> > >>>> have all
>>>>>>>> > >>>>>>>>>> the
>>>>>>>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in
>>>>>>>> code sharing
>>>>>>>> > >>> and
>>>>>>>> > >>>>>>>>>>>> executing
>>>>>>>> > >>>>>>>>>>>>> right of tests between the packages.
>>>>>>>> > >>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>> Agree with others, option 2 would be
>>>>>>>> > >>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
>>>>>>>> > >>>>>>>> amoghdesai....@gmail.com>
>>>>>>>> > >>>>>>>>>>>>> wrote:
>>>>>>>> > >>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash.
>>>>>>>> > >>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling
>>>>>>>> to handle
>>>>>>>> > the
>>>>>>>> > >>>> code
>>>>>>>> > >>>>>>>>>>>>>> duplication at *release* time.
>>>>>>>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in
>>>>>>>> itself.
>>>>>>>> > >>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if
>>>>>>>> we decide
>>>>>>>> > >>> to
>>>>>>>> > >>>>>>>>>> separate
>>>>>>>> > >>>>>>>>>>>>>> triggerer /
>>>>>>>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate
>>>>>>>> packages, back
>>>>>>>> > >>>> compat is
>>>>>>>> > >>>>>>>>>>>> going
>>>>>>>> > >>>>>>>>>>>>>> to be a nightmare
>>>>>>>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate.
>>>>>>>> > >>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>> Thanks & Regards,
>>>>>>>> > >>>>>>>>>>>>>> Amogh Desai
>>>>>>>> > >>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <
>>>>>>>> > >>> kaxiln...@gmail.com>
>>>>>>>> > >>>>>>>>>> wrote:
>>>>>>>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of
>>>>>>>> dependencies
>>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
>>>>>>>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid
>>>>>>>> > >>>>>>>>>>>>>>> wrote:
>>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is
>>>>>>>> it is
>>>>>>>> > >>> rather
>>>>>>>> > >>>>>>>>>>>> pragmatic
>>>>>>>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and
>>>>>>>> have less
>>>>>>>> > >>>> package
>>>>>>>> > >>>>>>>>>> counts
>>>>>>>> > >>>>>>>>>>>>>>>> and dependencies.
>>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking
>>>>>>>> (together with
>>>>>>>> > >>> Jarek,
>>>>>>>> > >>>> I am
>>>>>>>> > >>>>>>>>>> not
>>>>>>>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would
>>>>>>>> be
>>>>>>>> > possible.
>>>>>>>> > >>>> To
>>>>>>>> > >>>>>>>> keep
>>>>>>>> > >>>>>>>>>>>>>> the
>>>>>>>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into
>>>>>>>> another. If
>>>>>>>> > then
>>>>>>>> > >>>> at
>>>>>>>> > >>>>>>>>>> point
>>>>>>>> > >>>>>>>>>>>>>> of
>>>>>>>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we
>>>>>>>> have 1 set
>>>>>>>> > >>> of
>>>>>>>> > >>>>>>>> code.
>>>>>>>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy
>>>>>>>> could be
>>>>>>>> > >>>> solved by
>>>>>>>> > >>>>>>>> a
>>>>>>>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are
>>>>>>>> de-duplicated
>>>>>>>> > >>>> anyway
>>>>>>>> > >>>>>>>>>> based
>>>>>>>> > >>>>>>>>>>>>>> on
>>>>>>>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt.
>>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>>>>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI
>>>>>>>> - the
>>>>>>>> > >>>> reasonings
>>>>>>>> > >>>>>>>>>>>>>> you've
>>>>>>>> > >>>>>>>>>>>>>>>>> shown for that make sense to me.
>>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>> Shahar
>>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash
>>>>>>>> Berlin-Taylor <
>>>>>>>> > >>>> a...@apache.org
>>>>>>>> > >>>>>>>>>>>>>>> wrote:
>>>>>>>> > >>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level
>>>>>>>> separation of
>>>>>>>> > >>>> Task
>>>>>>>> > >>>>>>>> SDK
>>>>>>>> > >>>>>>>>>>>>>> and
>>>>>>>> > >>>>>>>>>>>>>>>> Core
>>>>>>>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some
>>>>>>>> situations
>>>>>>>> > where
>>>>>>>> > >>> we
>>>>>>>> > >>>>>>>> would
>>>>>>>> > >>>>>>>>>>>>>>> like
>>>>>>>> > >>>>>>>>>>>>>>>> to
>>>>>>>> > >>>>>>>>>>>>>>>>>> share code between these.
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just
>>>>>>>> put it in
>>>>>>>> > a
>>>>>>>> > >>>> common
>>>>>>>> > >>>>>>>>>>>>>> dist
>>>>>>>> > >>>>>>>>>>>>>>>> they
>>>>>>>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of
>>>>>>>> the Task
>>>>>>>> > SDK
>>>>>>>> > >>>>>>>>>>>>>> separation
>>>>>>>> > >>>>>>>>>>>>>>>> was
>>>>>>>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence
>>>>>>>> between the
>>>>>>>> > >>> two,
>>>>>>>> > >>>>>>>>>> ideally
>>>>>>>> > >>>>>>>>>>>>>>>> even if
>>>>>>>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv.
>>>>>>>> Most of the
>>>>>>>> > >>>> reason
>>>>>>>> > >>>>>>>>>> why
>>>>>>>> > >>>>>>>>>>>>>>> this
>>>>>>>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards
>>>>>>>> > >>>> compatibility -
>>>>>>>> > >>>>>>>> if
>>>>>>>> > >>>>>>>>>> we
>>>>>>>> > >>>>>>>>>>>>>>>> make
>>>>>>>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about
>>>>>>>> in
>>>>>>>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545
>>>>>>>> (but
>>>>>>>> > that
>>>>>>>> > >>>> covers
>>>>>>>> > >>>>>>>>>>>>>> some
>>>>>>>> > >>>>>>>>>>>>>>>> more
>>>>>>>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this
>>>>>>>> discussion
>>>>>>>> > >>>> such as
>>>>>>>> > >>>>>>>>>>>>>>>> possibly
>>>>>>>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a
>>>>>>>> single
>>>>>>>> > >>> provider
>>>>>>>> > >>>>>>>> dist.)
>>>>>>>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I
>>>>>>>> would like to
>>>>>>>> > >>>> share
>>>>>>>> > >>>
>>>>>>>> >
>>>>>>>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>>>>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want
>>>>>>>> to share
>>>>>>>> > will
>>>>>>>> > >>>> be
>>>>>>>> > >>>>>>>> the
>>>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from
>>>>>>>> airflow.configuration
>>>>>>>> > (but
>>>>>>>> > >>>>>>>> notably:
>>>>>>>> > >>>>>>>>>>>>>>> only
>>>>>>>> > >>>>>>>>>>>>>>>> the
>>>>>>>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values,
>>>>>>>> again,
>>>>>>>> > lets
>>>>>>>> > >>>> not
>>>>>>>> > >>>>>>>>>> dwell
>>>>>>>> > >>>>>>>>>>>>>>> on
>>>>>>>> > >>>>>>>>>>>>>>>> the
>>>>>>>> > >>>>>>>>>>>>>>>>>> specifics of that)
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue
>>>>>>>> here for
>>>>>>>> > >>>> discussion,
>>>>>>>> > >>>>>>>>>>>>>>> broadly
>>>>>>>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches:
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution
>>>>>>>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only
>>>>>>>> have the
>>>>>>>> > code
>>>>>>>> > >>>> in one
>>>>>>>> > >>>>>>>>>>>>>>> place.
>>>>>>>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case
>>>>>>>> of
>>>>>>>> > Logging
>>>>>>>> > >>>> config
>>>>>>>> > >>>>>>>>>> or
>>>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards
>>>>>>>> > compatibility
>>>>>>>> > >>> is
>>>>>>>> > >>>>>>>> much
>>>>>>>> > >>>>>>>>>>>>>> much
>>>>>>>> > >>>>>>>>>>>>>>>>>> harder.
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the
>>>>>>>> code is
>>>>>>>> > >>> released
>>>>>>>> > >>>>>>>>>>>>>>>> with/embedded
>>>>>>>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk
>>>>>>>> would contain
>>>>>>>> > >>> the
>>>>>>>> > >>>>>>>> right
>>>>>>>> > >>>>>>>>>>>>>>>> version
>>>>>>>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc).
>>>>>>>> The
>>>>>>>> > downside
>>>>>>>> > >>> is
>>>>>>>> > >>>>>>>> that
>>>>>>>> > >>>>>>>>>>>>>>> either
>>>>>>>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the
>>>>>>>> repo, or
>>>>>>>> > better
>>>>>>>> > >>>> yet
>>>>>>>> > >>>>>>>> it
>>>>>>>> > >>>>>>>>>>>>>>> would
>>>>>>>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some
>>>>>>>> tooling
>>>>>>>> > (TBD)
>>>>>>>> > >>>> will
>>>>>>>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either
>>>>>>>> at commit
>>>>>>>> > >>>> time, or
>>>>>>>> > >>>>>>>> my
>>>>>>>> > >>>>>>>>>>>>>>>>>> preference, at release time.
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am
>>>>>>>> very
>>>>>>>> > strongly
>>>>>>>> > >>>>>>>> leaning
>>>>>>>> > >>>>>>>>>>>>>>>> towards
>>>>>>>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think
>>>>>>>> the
>>>>>>>> > >>> backwards
>>>>>>>> > >>>>>>>>>>>>>>>> compatibility
>>>>>>>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very
>>>>>>>> quickly over
>>>>>>>> > >>>> time
>>>>>>>> > >>>>>>>> the
>>>>>>>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just
>>>>>>>> be
>>>>>>>> > >>>> unreasonable)
>>>>>>>> > >>>>>>>>>> and I
>>>>>>>> > >>>>>>>>>>>>>>>> don’t
>>>>>>>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as
>>>>>>>> we need
>>>>>>>> > to
>>>>>>>> > >>>> really
>>>>>>>> > >>>>>>>>>>>>>>> deliver
>>>>>>>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to
>>>>>>>> delivery
>>>>>>>> > >>> with
>>>>>>>> > >>>>>>>>>> AIP-72.
>>>>>>>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about
>>>>>>>> this, I will
>>>>>>>> > >>>> come up
>>>>>>>> > >>>>>>>>>>>>>> with
>>>>>>>> > >>>>>>>>>>>>>>> a
>>>>>>>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will
>>>>>>>> implement code
>>>>>>>> > >>>> sharing
>>>>>>>> > >>>>>>>>>> via
>>>>>>>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea
>>>>>>>> of how I
>>>>>>>> > can
>>>>>>>> > >>>>>>>> achieve
>>>>>>>> > >>>>>>>>>>>>>> this
>>>>>>>> > >>>>>>>>>>>>>>>> so
>>>>>>>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll
>>>>>>>> work
>>>>>>>> > there,
>>>>>>>> > >>>> but
>>>>>>>> > >>>>>>>> at
>>>>>>>> > >>>>>>>>>>>>>>>> runtime
>>>>>>>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it
>>>>>>>> lives at
>>>>>>>> > >>> something
>>>>>>>> > >>>>>>>> like
>>>>>>>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we
>>>>>>>> would end
>>>>>>>> > up
>>>>>>>> > >>>> with:
>>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml
>>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/src/
>>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/
>>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml
>>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/src/
>>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/
>>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/src
>>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/
>>>>>>>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml,
>>>>>>>> as deps
>>>>>>>> > would
>>>>>>>> > >>>> be
>>>>>>>> > >>>>>>>>>>>>>> included
>>>>>>>> > >>>>>>>>>>>>>>>> in
>>>>>>>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD.
>>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed.
>>>>>>>> > >>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>> > dev-unsubscr...@airflow.apache.org
>>>>>>>> > >>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>> > >>> dev-h...@airflow.apache.org
>>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>>> > >>>>>>>>
>>>>>>>> > >>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >>>>>>>> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscr...@airflow.apache.org
>>>>>>>> > >>>>>>>> For additional commands, e-mail:
>>>>>>>> dev-h...@airflow.apache.org
>>>>>>>> > >>>>>>>>
>>>>>>>> > >>>>>>>>
>>>>>>>> > >>>>>>
>>>>>>>> >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> > >>>>>> For additional commands, e-mail:
>>>>>>>> dev-h...@airflow.apache.org
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> > >>>>> For additional commands, e-mail:
>>>>>>>> dev-h...@airflow.apache.org
>>>>>>>> > >>>>>
>>>>>>>> > >>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>> > >>>>
>>>>>>>> > >>>>
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> > > For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> > For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>

Reply via email to