> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
there, I am understanding the direction
but could you elaborate a bit more please?

Let me elaborate:

As I understand (maybe I am wrong?),  the proposal is that we have a
"core-and-task-sdk" folder which is a shared distribution that is
vendored-in into both "airflow-core" and "airflow-task-sdk". This
contains some shared code that we want to include in both distributions
(note that we never ever release "core-and-task-sdk"
distribution because it only contains code that is shared between the two
distributions we release.

That's fine and cool.

Imagine that this distribution contains "logging" (shared code for logging)
and "config" (shared code for configuration). - both needed in
"airflow-core" and "airflow-task-sdk". So far so good. But what happen if
we want to use logging in the same fashion in say "airflow-ctl" (that also
applies for other distributions that we might come up with) ? Are we going
to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" ?
It would be far better if we just vendor in "logging" and do not vendor-in
"config".

And if we are going to have a mechanism to vendor-in "a distribution" -
there is nothing wrong with having the same way to vendor-in multiple
distributions - so we can easily do it this way (i added "orm" ,
"serialization". and "fast_api" as an example thing that we might want to
share - not sure if that is really something we want to do but it will
allow to illustrate my idea better)

/
  airflow-ctl/
  task-sdk/...
  airflow-core/...
  ....
  shared/
    kubernetes/
      pyproject.toml
      src/
        airflow_shared_kubernetes/__init__.py
    logging/
      pyproject.toml
      src/
        airflow_shared_logging/__init__.py
    config/
      pyproject.toml
      src/
        airflow_shared_config/__init__.py
    orm/
      pyproject.toml
      src/
        airflow_shared_orm/__init__.py
     serialization/
      pyproject.toml
      src/
        airflow_shared_serialization/__init__.py
     fast_api/
      pyproject.toml
      src/
        airflow_shared_fast_api/__init__.py
    ...

This has multiple benefits (and I see no real drawbacks):

* the code can be really well modularised.  Those "things" we share (and
this also connects to the __init__.py discussion) - can be independent -
and (it follow Jens comment) it allows to keep low cyclomatic complexity
https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way easier
to implement logging in the way that it does not import or use config. This
means for example that configuration for logging will need to be injected
when logging is initialized - and that's exactly what we want, we do not
want logging code to use configuration code directly - they should be
independent from each other and you should be free to vendor-in either
logging or config independently if you also vendored-in the other.

* it's much more logical. We split based on functionality we want to share
- not about the "distributions" we want to produce. That allows us - in the
future - to make different decisions on how we split our distributions. For
example (i do not tell we have to do it, or that we will do it but we will
have such a possibility) - we can add more shared utilities we find useful
in the same way and decide that "scheduler", "api_server" or "scheduler" or
"triggerer" or "dag processor" are split to separate distributions -
because for example we want to keep a number of dependencies down. And for
example "api_server" might use "fast_api", "config", "logging", "orm" and
"fast_api" and "serialization" , where the scheduler should not need
"fast_api". The "dag_processor" eventually might not need "orm" nor
"fast_api" and only use the other

This seems like a natural approach. If we have a mechanism to "share" the
code, it does not add complexity, but allows us to isolate independent
functionality into "isolated" boxes and use them

Also for cyclomatic complexity that is a complex word (badly chosen as it
scares people away) and has some math behind, but it really boils down to
very simple "rules of thumb" (and yes I am a big proponent of having low
cyclomatic complexity).

a) when you are building the "final" product (i.e. distribution you want to
release) - make sure that you only "use" things - that nothing else is
"using" you as a library.
b) when you are building a "shared" thing (a library) - make sure that
library is only "used" by others but it does not really "use" anything
else. For example in the case I explained above - we can achieve
low-cyclomatic complexity when:

* airflow-core uses: logging, config, orm, serialization, fast_api
* none of the "logging, config, orm, serialization, fast_api" use each
other - they are at the bottom of the "user -> used" tree

J.



On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com> wrote:

> I like the folder structure proposed by Ash and have no objections with it.
>
> "core_and_task_sdk" sounds good to me and justifies what it should do
> pretty well.
>
> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
> there, I am understanding the direction
> but could you elaborate a bit more please?
>
> Naming is REALLY hard!
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> How about splitting it even more and having each shared "thing" named?
>> "logging", "config" and sharing them explicitly and separately with the
>> right "user" ?
>> That sounds way more modular and  we will be able to choose which of the
>> shared "utils" we use where.
>>
>> J.
>>
>>
>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler <j_scheff...@gmx.de.invalid
>> >
>> wrote:
>>
>> > I like "core_and_task_sdk the same like core-and-task-sdk - I have no
>> > problem and it is a path only.
>> >
>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which is a
>> > bit bulky we then should name it "all-not-api-server" :-D
>> >
>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote:
>> > > In case I did a bad job explaining it, the “core and task sdk” is not
>> in
>> > the module name/import name, just in the file path.
>> > >
>> > > Anyone have other ideas?
>> > >
>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com>
>> wrote:
>> > >>
>> > >> Thanks Ash! Looks cool! I like the structure. This will enable all
>> the
>> > >> combinations and structure looks easy to grasp. No strong stance on
>> the
>> > >> naming other than maybe it is a bit long with `and`, `core_ctl`
>> could be
>> > >> shorter, since no import path is defined like that, we can give any
>> name
>> > >> for sure.
>> > >>
>> > >> Best regards,
>> > >>
>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote:
>> > >>>
>> > >>> Looks good but I think we should find some better logical name for
>> > >>> core_and_sdk :)
>> > >>>
>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
>> > >>> <j_scheff...@gmx.de.invalid> napisał:
>> > >>>
>> > >>>> Cool! Especially the "shared" folder with the ability to have
>> > >>>> N-combinations w/o exploding project repo root!
>> > >>>>
>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote:
>> > >>>>> Oh, and all of this will be explain in shared/README.md
>> > >>>>>
>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org>
>> wrote:
>> > >>>>>>
>> > >>>>>> Okay, so it seems we have agreement on the approach here, so I’ll
>> > >>>> continue with this, and on the dev call it was mentioned that
>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal for
>> the
>> > file
>> > >>>> structure;
>> > >>>>>> ```
>> > >>>>>> /
>> > >>>>>>   task-sdk/...
>> > >>>>>>   airflow-core/...
>> > >>>>>>   shared/
>> > >>>>>>     kuberenetes/
>> > >>>>>>       pyproject.toml
>> > >>>>>>       src/
>> > >>>>>>         airflow_kube/__init__.py
>> > >>>>>>     core-and-tasksdk/
>> > >>>>>>       pyproject.toml
>> > >>>>>>       src/
>> > >>>>>>         airflow_shared/__init__.py
>> > >>>>>> ```
>> > >>>>>>
>> > >>>>>> Things to note here: the “shared” folder has (the possibility) of
>> > >>>> having multiple different shared “libraries” in it, in this
>> example I
>> > am
>> > >>>> supposing a hypothetical shared kuberenetes folder a world in
>> which we
>> > >>>> split the KubePodOperator and the KubeExecutor in to two separate
>> > >>>> distributions (example only, not proposing we do that right now,
>> and
>> > that
>> > >>>> will be a separate discussion)
>> > >>>>>> The other things to note here:
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> - the folder name in shared aims to be “self-documenting”, hence
>> the
>> > >>>> verbose “core-and-tasksdk” to say where the shared library is
>> > intended to
>> > >>>> be used.
>> > >>>>>> - the python module itself should almost always have an
>> `airflow_`
>> > (or
>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with
>> anything
>> > >>> else
>> > >>>> we might use. It won’t matter “in production” as those will be
>> > vendored
>> > >>> in
>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but
>> avoiding
>> > >>>> conflicts at dev time with the Finder approach is a good safety
>> > measure.
>> > >>>>>> I will start making a real PR for this proposal now, but I’m
>> open to
>> > >>>> feedback (either here, or in the PR when I open it)
>> > >>>>>> -ash
>> > >>>>>>
>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote:
>> > >>>>>>>
>> > >>>>>>> Yeah we have to try it and test - also building packages happens
>> > semi
>> > >>>>>>> frequently when you run `uv sync` (they use some kind of
>> heuristics
>> > >>> to
>> > >>>>>>> decide when) and you can force it with `--reinstall` or
>> > `--refresh`.
>> > >>>>>>> Package build also happens every time when you run "ci-image
>> build`
>> > >>>> now in
>> > >>>>>>> breeze so it seems like it will nicely integrate in our
>> workflows.
>> > >>>>>>>
>> > >>>>>>> Looks really cool Ash.
>> > >>>>>>>
>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <
>> a...@apache.org>
>> > >>>> wrote:
>> > >>>>>>>> It’s not just release time, but any time we build a package
>> which
>> > >>>> happens
>> > >>>>>>>> on “every” CI run. The normal unit tests will use code from
>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which build
>> an
>> > >>>> image will
>> > >>>>>>>> build the dists and vendor in the code from that commit.
>> > >>>>>>>>
>> > >>>>>>>> There is only a single copy of the shared code committed to the
>> > >>> repo,
>> > >>>> so
>> > >>>>>>>> there is never anything to synchronise.
>> > >>>>>>>>
>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <
>> amoghdesai....@gmail.com>
>> > >>>> wrote:
>> > >>>>>>>>> Thanks Ash.
>> > >>>>>>>>>
>> > >>>>>>>>> This is really cool and helpful that you were able to test
>> both
>> > >>>> scenarios
>> > >>>>>>>>> -- repo checkout
>> > >>>>>>>>> and also installing from the vendored package and the
>> resolution
>> > >>>> worked
>> > >>>>>>>>> fine too.
>> > >>>>>>>>>
>> > >>>>>>>>> I like this idea compared the to relative import one for few
>> > >>> reasons:
>> > >>>>>>>>> - It feels like it will take some time to adjust to the new
>> > coding
>> > >>>>>>>> standard
>> > >>>>>>>>> that we will lay
>> > >>>>>>>>> if we impose relative imports in the shared dist
>> > >>>>>>>>> - We can continue using repo wise absolute import standards,
>> it
>> > is
>> > >>>> also
>> > >>>>>>>>> much easier for situations
>> > >>>>>>>>> when we do global search in IDE to find + replace, this could
>> > mean
>> > >>> a
>> > >>>>>>>> change
>> > >>>>>>>>> there
>> > >>>>>>>>> - The vendoring work is a proven and established paradigm
>> across
>> > >>>> projects
>> > >>>>>>>>> and would
>> > >>>>>>>>> out of box give us the build tooling we need also
>> > >>>>>>>>>
>> > >>>>>>>>> Nothing too against the relative import but with the evidence
>> > >>>> provided
>> > >>>>>>>>> above, vendored approach
>> > >>>>>>>>> seems to only do us good.
>> > >>>>>>>>>
>> > >>>>>>>>> Regarding synchronizing it, release time should be fine as
>> long
>> > as
>> > >>> we
>> > >>>>>>>> have
>> > >>>>>>>>> a good CI workflow to probably
>> > >>>>>>>>> catch such issues per PR if changes are made in shared dist?
>> > >>>> (precommit
>> > >>>>>>>>> would make it really slow i guess)
>> > >>>>>>>>>
>> > >>>>>>>>> If we can run our tests with vendored code we should be mostly
>> > >>>> covered.
>> > >>>>>>>>> Good effort all!
>> > >>>>>>>>>
>> > >>>>>>>>> Thanks & Regards,
>> > >>>>>>>>> Amogh Desai
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <
>> > a...@apache.org>
>> > >>>>>>>> wrote:
>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m happy
>> with.
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>
>> >
>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
>> > >>>>>>>>>> This produces the following from `uv build task-sdk`
>> > >>>>>>>>>> -
>> > >>>>>>>>>>
>> > >>>
>> >
>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
>> > >>>>>>>>>> -
>> > >>>>>>>>>>
>> > >>>
>> >
>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
>> > >>>>>>>>>>
>> > >>>>>>>>>> ```
>> > >>>>>>>>>> ❯ unzip -l
>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
>> > >>>>>>>> grep
>> > >>>>>>>>>> _vendor
>> > >>>>>>>>>>      50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
>> > >>>>>>>>>>    2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
>> > >>>>>>>>>>      28  02-02-2020 00:00
>> >  airflow/sdk/_vendor/airflow_common.pyi
>> > >>>>>>>>>>      18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
>> > >>>>>>>>>>     785  02-02-2020 00:00
>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
>> > >>>>>>>>>>   10628  02-02-2020 00:00
>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
>> > >>>>>>>>>> ```
>> > >>>>>>>>>>
>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete too:
>> > >>>>>>>>>> ```
>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep
>> > _vendor
>> > >>>>>>>>>>
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
>> > >>>>>>>>>>
>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
>> > >>>>>>>>>>
>> > >>>>
>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
>> > >>>>>>>>>>
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>
>> >
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
>> > >>>
>> >
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
>> > >>>>>>>>>> ```
>> > >>>>>>>>>>
>> > >>>>>>>>>> The plugin works at build time by including/copying the libs
>> > >>>> specified
>> > >>>>>>>> in
>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of
>> import
>> > >>>>>>>> rewrites.)
>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from a repo
>> > >>>> checkout,
>> > >>>>>>>> I
>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and since
>> it’s at
>> > >>> the
>> > >>>>>>>> end of
>> > >>>>>>>>>> the list it will only be used if the normal import can’t find
>> > >>>> things.
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>
>> >
>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect “import
>> > >>>> rewriting”
>> > >>>>>>>>>> affect, as in this approach `airflow_common` is directly
>> loaded
>> > >>>> (i.e.
>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist
>> in
>> > >>>>>>>>>> sys.modules), but it does work for everything that I was
>> able to
>> > >>>> test..
>> > >>>>>>>>>> I tested it with the diff at the end of this message. My test
>> > >>>> ipython
>> > >>>>>>>>>> shell:
>> > >>>>>>>>>>
>> > >>>>>>>>>> ```
>> > >>>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone
>> import
>> > >>> foo
>> > >>>>>>>>>> In [2]: foo
>> > >>>>>>>>>> Out[2]: 1
>> > >>>>>>>>>>
>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common
>> > >>>>>>>>>>
>> > >>>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone
>> > >>>>>>>>>>
>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
>> > >>>>>>>>>> Out[5]:
>> > >>>>>>>>>>
>> > >>>
>> >
>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
>> > >>>>>>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
>> > >>>>>>>>>> Out[6]:
>> > >>>>>>>>>>
>> > >>>
>> >
>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
>> > >>>>>>>>>> ```
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> And in an standalone environment with the SDK dist I built
>> (it
>> > >>>> needed
>> > >>>>>>>> the
>> > >>>>>>>>>> matching airflow-core right now, but that is nothing to do
>> with
>> > >>> this
>> > >>>>>>>>>> discussion):
>> > >>>>>>>>>>
>> > >>>>>>>>>> ```
>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more information
>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for
>> > >>> help.
>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the options
>> with
>> > >>>>>>>> `%history?`
>> > >>>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone
>> > >>>>>>>>>>
>> > >>>>>>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
>> > >>>>>>>>>> Out[2]:
>> > >>>>>>>>>>
>> > >>>
>> >
>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
>> > >>>>>>>>>> ``
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> ```diff
>> > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py
>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py
>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644
>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py
>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
>> > >>>>>>>>>> @@ -14,3 +14,5 @@
>> > >>>>>>>>>> # KIND, either express or implied.  See the License for the
>> > >>>>>>>>>> # specific language governing permissions and limitations
>> > >>>>>>>>>> # under the License.
>> > >>>>>>>>>> +
>> > >>>>>>>>>> +foo = 1
>> > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py
>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py
>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644
>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py
>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3
>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
>> > >>>>>>>>>> utc = pendulum.UTC
>> > >>>>>>>>>>
>> > >>>>>>>>>> +
>> > >>>>>>>>>> +from airflow_common import foo
>> > >>>>>>>>>> +
>> > >>>>>>>>>> TIMEZONE: Timezone
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> ```
>> > >>>>>>>>>>
>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com>
>> > wrote:
>> > >>>>>>>>>>> I think both approaches are doable:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by pre-commit
>> when
>> > >>>>>>>> importing
>> > >>>>>>>>>>> from different distributions and make sure we are only doing
>> > >>>> relative
>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of this
>> > >>>> already. And
>> > >>>>>>>>>> yes
>> > >>>>>>>>>>> it would require relative links we currently do not allow.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point in
>> time
>> > >>> will
>> > >>>> have
>> > >>>>>>>>>> to
>> > >>>>>>>>>>> decide to synchronize this and if it happens just before
>> > release
>> > >>>> (I bet
>> > >>>>>>>>>>> this is going to happen) this will lead to solving problems
>> > that
>> > >>>> would
>> > >>>>>>>>>>> normally be solved during PR when you make a change (i.e.
>> > >>> symbolic
>> > >>>> link
>> > >>>>>>>>>> has
>> > >>>>>>>>>>> the advantage that whoever modifies shared code will be
>> > >>> immediately
>> > >>>>>>>>>>> notified in their PR - that they broke something because
>> either
>> > >>>> static
>> > >>>>>>>>>>> checks or mypy or tests fail.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when) does
>> the
>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could solve
>> it
>> > if
>> > >>>> it is
>> > >>>>>>>>>> done
>> > >>>>>>>>>>> more frequently and with some regularity? We could
>> potentially
>> > >>>> force
>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code changes
>> > (and
>> > >>>>>>>> prevent
>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place (other
>> than
>> > >>>> releases)
>> > >>>>>>>>>> in
>> > >>>>>>>>>>> our development workflow and that seems to be a bit too
>> late as
>> > >>>> puts an
>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities
>> introduced
>> > on
>> > >>>>>>>> release
>> > >>>>>>>>>>> manager and delays the release. WDYT?
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Re: relative links. I think for a shared library we could
>> > >>>> potentially
>> > >>>>>>>>>> relax
>> > >>>>>>>>>>> this and allow them (and actually disallow absolute links in
>> > the
>> > >>>> pieces
>> > >>>>>>>>>> of
>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I recall,
>> the
>> > >>> only
>> > >>>>>>>>>> reason
>> > >>>>>>>>>>> we forbade the relative links is because of how we are (or
>> > maybe
>> > >>>> were)
>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So we
>> decided
>> > >>> to
>> > >>>> just
>> > >>>>>>>>>> not
>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing works
>> is
>> > >>> that
>> > >>>>>>>> when
>> > >>>>>>>>>>> you are using importlib to read the Dag from a file, the
>> > relative
>> > >>>>>>>> imports
>> > >>>>>>>>>>> do not work as it does not know what they should be relative
>> > to.
>> > >>>> But if
>> > >>>>>>>>>>> relative import is done from an imported package, it should
>> be
>> > no
>> > >>>>>>>>>> problem,
>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to import any
>> > >>>> library
>> > >>>>>>>> that
>> > >>>>>>>>>>> uses relative imports.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Of course consistency might be the reason why we do not
>> want to
>> > >>>>>>>> introduce
>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is
>> guarded
>> > by
>> > >>>>>>>>>> pre-commit
>> > >>>>>>>>>>> though.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> J.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> J.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
>> > >>>> a...@apache.org>
>> > >>>>>>>>>>> napisał:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It does
>> mean
>> > >>> you
>> > >>>>>>>> can’t
>> > >>>>>>>>>>>> use absolute imports in one common module to another.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> For example
>> > >>>>>>>>>>>>
>> > >>>
>> >
>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>> > >>>>>>>>>>>> where we have
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> ```
>> > >>>>>>>>>>>> from airflow.utils.module_loading import import_string
>> > >>>>>>>>>>>> ```
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib that is
>> > then
>> > >>>>>>>>>> symlinked
>> > >>>>>>>>>>>> then we wouldn’t be able to have `from
>> > >>>> airflow_common.module_loading
>> > >>>>>>>>>> import
>> > >>>>>>>>>>>> import_string`.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> I can think of two possible solutions here.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this shared
>> lib, so
>> > >>>> `from
>> > >>>>>>>>>>>> .module_loading import import_string`
>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers)
>> which
>> > >>> will
>> > >>>>>>>> handle
>> > >>>>>>>>>>>> import-rewriting for us.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so
>> I
>> > >>>> prepared
>> > >>>>>>>> a
>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could look like
>> > here
>> > >>>>>>>>>>>>
>> > >>>
>> >
>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>> > >>>>>>>>>>>> Now personally I am more than happy with relative imports,
>> but
>> > >>>>>>>> generally
>> > >>>>>>>>>>>> as a project we have avoided them, so I think that limits
>> what
>> > >>> we
>> > >>>>>>>> could
>> > >>>>>>>>>> do
>> > >>>>>>>>>>>> with a symlink based approach.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> -ash
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
>> > >>>>>>>> gopidesupa...@gmail.com>
>> > >>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>> Thanks Ash
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making sure
>> we
>> > >>>> have all
>> > >>>>>>>>>> the
>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code
>> sharing
>> > >>> and
>> > >>>>>>>>>>>> executing
>> > >>>>>>>>>>>>> right of tests between the packages.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> Agree with others, option 2 would be
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
>> > >>>>>>>> amoghdesai....@gmail.com>
>> > >>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to
>> handle
>> > the
>> > >>>> code
>> > >>>>>>>>>>>>>> duplication at *release* time.
>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in
>> itself.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we
>> decide
>> > >>> to
>> > >>>>>>>>>> separate
>> > >>>>>>>>>>>>>> triggerer /
>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate packages, back
>> > >>>> compat is
>> > >>>>>>>>>>>> going
>> > >>>>>>>>>>>>>> to be a nightmare
>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Thanks & Regards,
>> > >>>>>>>>>>>>>> Amogh Desai
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <
>> > >>> kaxiln...@gmail.com>
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of
>> dependencies
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid
>> > >>>>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is
>> > >>> rather
>> > >>>>>>>>>>>> pragmatic
>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and have less
>> > >>>> package
>> > >>>>>>>>>> counts
>> > >>>>>>>>>>>>>>>> and dependencies.
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together with
>> > >>> Jarek,
>> > >>>> I am
>> > >>>>>>>>>> not
>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be
>> > possible.
>> > >>>> To
>> > >>>>>>>> keep
>> > >>>>>>>>>>>>>> the
>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into another. If
>> > then
>> > >>>> at
>> > >>>>>>>>>> point
>> > >>>>>>>>>>>>>> of
>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we have 1
>> set
>> > >>> of
>> > >>>>>>>> code.
>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy could be
>> > >>>> solved by
>> > >>>>>>>> a
>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are
>> de-duplicated
>> > >>>> anyway
>> > >>>>>>>>>> based
>> > >>>>>>>>>>>>>> on
>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt.
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - the
>> > >>>> reasonings
>> > >>>>>>>>>>>>>> you've
>> > >>>>>>>>>>>>>>>>> shown for that make sense to me.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Shahar
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <
>> > >>>> a...@apache.org
>> > >>>>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>>>>> Hello everyone,
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level
>> separation of
>> > >>>> Task
>> > >>>>>>>> SDK
>> > >>>>>>>>>>>>>> and
>> > >>>>>>>>>>>>>>>> Core
>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some situations
>> > where
>> > >>> we
>> > >>>>>>>> would
>> > >>>>>>>>>>>>>>> like
>> > >>>>>>>>>>>>>>>> to
>> > >>>>>>>>>>>>>>>>>> share code between these.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put it
>> in
>> > a
>> > >>>> common
>> > >>>>>>>>>>>>>> dist
>> > >>>>>>>>>>>>>>>> they
>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the
>> Task
>> > SDK
>> > >>>>>>>>>>>>>> separation
>> > >>>>>>>>>>>>>>>> was
>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence between
>> the
>> > >>> two,
>> > >>>>>>>>>> ideally
>> > >>>>>>>>>>>>>>>> even if
>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. Most of
>> the
>> > >>>> reason
>> > >>>>>>>>>> why
>> > >>>>>>>>>>>>>>> this
>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards
>> > >>>> compatibility -
>> > >>>>>>>> if
>> > >>>>>>>>>> we
>> > >>>>>>>>>>>>>>>> make
>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in
>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but
>> > that
>> > >>>> covers
>> > >>>>>>>>>>>>>> some
>> > >>>>>>>>>>>>>>>> more
>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this
>> discussion
>> > >>>> such as
>> > >>>>>>>>>>>>>>>> possibly
>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a single
>> > >>> provider
>> > >>>>>>>> dist.)
>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would like
>> to
>> > >>>> share
>> > >>>
>> >
>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to share
>> > will
>> > >>>> be
>> > >>>>>>>> the
>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration
>> > (but
>> > >>>>>>>> notably:
>> > >>>>>>>>>>>>>>> only
>> > >>>>>>>>>>>>>>>> the
>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, again,
>> > lets
>> > >>>> not
>> > >>>>>>>>>> dwell
>> > >>>>>>>>>>>>>>> on
>> > >>>>>>>>>>>>>>>> the
>> > >>>>>>>>>>>>>>>>>> specifics of that)
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue here for
>> > >>>> discussion,
>> > >>>>>>>>>>>>>>> broadly
>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches:
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution
>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the
>> > code
>> > >>>> in one
>> > >>>>>>>>>>>>>>> place.
>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case of
>> > Logging
>> > >>>> config
>> > >>>>>>>>>> or
>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards
>> > compatibility
>> > >>> is
>> > >>>>>>>> much
>> > >>>>>>>>>>>>>> much
>> > >>>>>>>>>>>>>>>>>> harder.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is
>> > >>> released
>> > >>>>>>>>>>>>>>>> with/embedded
>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would
>> contain
>> > >>> the
>> > >>>>>>>> right
>> > >>>>>>>>>>>>>>>> version
>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The
>> > downside
>> > >>> is
>> > >>>>>>>> that
>> > >>>>>>>>>>>>>>> either
>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo, or
>> > better
>> > >>>> yet
>> > >>>>>>>> it
>> > >>>>>>>>>>>>>>> would
>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some tooling
>> > (TBD)
>> > >>>> will
>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at
>> commit
>> > >>>> time, or
>> > >>>>>>>> my
>> > >>>>>>>>>>>>>>>>>> preference, at release time.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very
>> > strongly
>> > >>>>>>>> leaning
>> > >>>>>>>>>>>>>>>> towards
>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the
>> > >>> backwards
>> > >>>>>>>>>>>>>>>> compatibility
>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very quickly
>> over
>> > >>>> time
>> > >>>>>>>> the
>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be
>> > >>>> unreasonable)
>> > >>>>>>>>>> and I
>> > >>>>>>>>>>>>>>>> don’t
>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as we
>> need
>> > to
>> > >>>> really
>> > >>>>>>>>>>>>>>> deliver
>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to
>> delivery
>> > >>> with
>> > >>>>>>>>>> AIP-72.
>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about this, I
>> will
>> > >>>> come up
>> > >>>>>>>>>>>>>> with
>> > >>>>>>>>>>>>>>> a
>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will implement
>> code
>> > >>>> sharing
>> > >>>>>>>>>> via
>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I
>> > can
>> > >>>>>>>> achieve
>> > >>>>>>>>>>>>>> this
>> > >>>>>>>>>>>>>>>> so
>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll work
>> > there,
>> > >>>> but
>> > >>>>>>>> at
>> > >>>>>>>>>>>>>>>> runtime
>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at
>> > >>> something
>> > >>>>>>>> like
>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we would
>> end
>> > up
>> > >>>> with:
>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml
>> > >>>>>>>>>>>>>>>>>> airflow-core/src/
>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/
>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml
>> > >>>>>>>>>>>>>>>>>> task-sdk/src/
>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/
>> > >>>>>>>>>>>>>>>>>> airflow-common/src
>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/
>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps
>> > would
>> > >>>> be
>> > >>>>>>>>>>>>>> included
>> > >>>>>>>>>>>>>>>> in
>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed.
>> > >>>>
>> ---------------------------------------------------------------------
>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>> > dev-unsubscr...@airflow.apache.org
>> > >>>>>>>>>>>>>>>> For additional commands, e-mail:
>> > >>> dev-h...@airflow.apache.org
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>
>> > >>>
>> ---------------------------------------------------------------------
>> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > ---------------------------------------------------------------------
>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> > >>>>>>
>> > >>>>>
>> ---------------------------------------------------------------------
>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> > >>>>>
>> > >>>>
>> ---------------------------------------------------------------------
>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> > >>>>
>> > >>>>
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > > For additional commands, e-mail: dev-h...@airflow.apache.org
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > For additional commands, e-mail: dev-h...@airflow.apache.org
>> >
>> >
>>
>

Reply via email to