Probably, you make a valid point.

Maybe this is an implementation detail, so we could figure it out as we
start on a POC and factor in these things
as we move along?

But from an initial guess, I would think that execution time related items
(if we manage to enumerate them) would be something
that would be better off in that "core_and_task_sdk" bundle.

Thanks & Regards,
Amogh Desai


On Tue, Jul 8, 2025 at 3:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> > Not that I am against your idea and we can surely expand as we need but
> we would not need to expand the
> "core_and_task_sdk" if we put only the relevant items into it.
>
> So if we move logging and config out, my question is what is really
> relevant to "stay" in "core_and_task_sdk" ? And what we know will not be
> needed by other distributions in the future ?
>
> What would be the content of "core_and_task_sdk" ? Can we enumerate what
> should go there now ?
>
> My bet is that if we enumerate them, it will turn out that they are good
> candidates to make separate "shared features" that can be logically named
> and modularised, and that there is no real need to have shared
> "core_and_task_sdk" which sounds like "bag of everything else".
>
>
>
> On Tue, Jul 8, 2025 at 11:48 AM Amogh Desai <amoghdesai....@gmail.com>
> wrote:
>
>> Yeah, I think what you are showcasing here is a step ahead of the
>> initial proposal from Ash.
>>
>> From the original proposal, the `core_and_task_sdk` *can* have the
>> things relevant to just those two
>> distros. Logging, Config are modules that might be needed by airflow-ctl
>> for example, so ideally, those
>> would not be good candidates to be put in there, ideally speaking.
>>
>> The example of Kubernetes Utils sounds to be a good example, it will be
>> used by KubeExecutor (lets say this is a
>> module called "executors") and by KPO (providers), and the
>> "shared/kubernetes" would probably be a good
>> candidate for that.
>>
>> Not that I am against your idea and we can surely expand as we need but
>> we would not need to expand the
>> "core_and_task_sdk" if we put only the relevant items into it.
>>
>> Thanks & Regards,
>> Amogh Desai
>>
>>
>> On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> > @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
>>> there, I am understanding the direction
>>> but could you elaborate a bit more please?
>>>
>>> Let me elaborate:
>>>
>>> As I understand (maybe I am wrong?),  the proposal is that we have a
>>> "core-and-task-sdk" folder which is a shared distribution that is
>>> vendored-in into both "airflow-core" and "airflow-task-sdk". This
>>> contains some shared code that we want to include in both distributions
>>> (note that we never ever release "core-and-task-sdk"
>>> distribution because it only contains code that is shared between the two
>>> distributions we release.
>>>
>>> That's fine and cool.
>>>
>>> Imagine that this distribution contains "logging" (shared code
>>> for logging) and "config" (shared code for configuration). - both needed in
>>> "airflow-core" and "airflow-task-sdk". So far so good. But what happen if
>>> we want to use logging in the same fashion in say "airflow-ctl" (that also
>>> applies for other distributions that we might come up with) ? Are we going
>>> to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" ?
>>> It would be far better if we just vendor in "logging" and do not vendor-in
>>> "config".
>>>
>>> And if we are going to have a mechanism to vendor-in "a distribution" -
>>> there is nothing wrong with having the same way to vendor-in multiple
>>> distributions - so we can easily do it this way (i added "orm" ,
>>> "serialization". and "fast_api" as an example thing that we might want to
>>> share - not sure if that is really something we want to do but it will
>>> allow to illustrate my idea better)
>>>
>>> /
>>>   airflow-ctl/
>>>   task-sdk/...
>>>   airflow-core/...
>>>   ....
>>>   shared/
>>>     kubernetes/
>>>       pyproject.toml
>>>       src/
>>>         airflow_shared_kubernetes/__init__.py
>>>     logging/
>>>       pyproject.toml
>>>       src/
>>>         airflow_shared_logging/__init__.py
>>>     config/
>>>       pyproject.toml
>>>       src/
>>>         airflow_shared_config/__init__.py
>>>     orm/
>>>       pyproject.toml
>>>       src/
>>>         airflow_shared_orm/__init__.py
>>>      serialization/
>>>       pyproject.toml
>>>       src/
>>>         airflow_shared_serialization/__init__.py
>>>      fast_api/
>>>       pyproject.toml
>>>       src/
>>>         airflow_shared_fast_api/__init__.py
>>>     ...
>>>
>>> This has multiple benefits (and I see no real drawbacks):
>>>
>>> * the code can be really well modularised.  Those "things" we share (and
>>> this also connects to the __init__.py discussion) - can be independent -
>>> and (it follow Jens comment) it allows to keep low cyclomatic complexity
>>> https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way
>>> easier to implement logging in the way that it does not import or use
>>> config. This means for example that configuration for logging will need to
>>> be injected when logging is initialized - and that's exactly what we want,
>>> we do not want logging code to use configuration code directly - they
>>> should be independent from each other and you should be free to vendor-in
>>> either logging or config independently if you also vendored-in the other.
>>>
>>> * it's much more logical. We split based on functionality we want to
>>> share - not about the "distributions" we want to produce. That allows us -
>>> in the future - to make different decisions on how we split our
>>> distributions. For example (i do not tell we have to do it, or that we will
>>> do it but we will have such a possibility) - we can add more shared
>>> utilities we find useful in the same way and decide that "scheduler",
>>> "api_server" or "scheduler" or "triggerer" or "dag processor" are split to
>>> separate distributions - because for example we want to keep a number of
>>> dependencies down. And for example "api_server" might use "fast_api",
>>> "config", "logging", "orm" and "fast_api" and "serialization" , where the
>>> scheduler should not need "fast_api". The "dag_processor" eventually might
>>> not need "orm" nor "fast_api" and only use the other
>>>
>>> This seems like a natural approach. If we have a mechanism to "share"
>>> the code, it does not add complexity, but allows us to isolate independent
>>> functionality into "isolated" boxes and use them
>>>
>>> Also for cyclomatic complexity that is a complex word (badly chosen as
>>> it scares people away) and has some math behind, but it really boils down
>>> to very simple "rules of thumb" (and yes I am a big proponent of having low
>>> cyclomatic complexity).
>>>
>>> a) when you are building the "final" product (i.e. distribution you want
>>> to release) - make sure that you only "use" things - that nothing else is
>>> "using" you as a library.
>>> b) when you are building a "shared" thing (a library) - make sure that
>>> library is only "used" by others but it does not really "use" anything
>>> else. For example in the case I explained above - we can achieve
>>> low-cyclomatic complexity when:
>>>
>>> * airflow-core uses: logging, config, orm, serialization, fast_api
>>> * none of the "logging, config, orm, serialization, fast_api" use each
>>> other - they are at the bottom of the "user -> used" tree
>>>
>>> J.
>>>
>>>
>>>
>>> On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com>
>>> wrote:
>>>
>>>> I like the folder structure proposed by Ash and have no objections with
>>>> it.
>>>>
>>>> "core_and_task_sdk" sounds good to me and justifies what it should do
>>>> pretty well.
>>>>
>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
>>>> there, I am understanding the direction
>>>> but could you elaborate a bit more please?
>>>>
>>>> Naming is REALLY hard!
>>>>
>>>> Thanks & Regards,
>>>> Amogh Desai
>>>>
>>>>
>>>> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> How about splitting it even more and having each shared "thing" named?
>>>>> "logging", "config" and sharing them explicitly and separately with the
>>>>> right "user" ?
>>>>> That sounds way more modular and  we will be able to choose which of
>>>>> the
>>>>> shared "utils" we use where.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler
>>>>> <j_scheff...@gmx.de.invalid>
>>>>> wrote:
>>>>>
>>>>> > I like "core_and_task_sdk the same like core-and-task-sdk - I have no
>>>>> > problem and it is a path only.
>>>>> >
>>>>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which is a
>>>>> > bit bulky we then should name it "all-not-api-server" :-D
>>>>> >
>>>>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote:
>>>>> > > In case I did a bad job explaining it, the “core and task sdk” is
>>>>> not in
>>>>> > the module name/import name, just in the file path.
>>>>> > >
>>>>> > > Anyone have other ideas?
>>>>> > >
>>>>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com>
>>>>> wrote:
>>>>> > >>
>>>>> > >> Thanks Ash! Looks cool! I like the structure. This will enable
>>>>> all the
>>>>> > >> combinations and structure looks easy to grasp. No strong stance
>>>>> on the
>>>>> > >> naming other than maybe it is a bit long with `and`, `core_ctl`
>>>>> could be
>>>>> > >> shorter, since no import path is defined like that, we can give
>>>>> any name
>>>>> > >> for sure.
>>>>> > >>
>>>>> > >> Best regards,
>>>>> > >>
>>>>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com>
>>>>> wrote:
>>>>> > >>>
>>>>> > >>> Looks good but I think we should find some better logical name
>>>>> for
>>>>> > >>> core_and_sdk :)
>>>>> > >>>
>>>>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
>>>>> > >>> <j_scheff...@gmx.de.invalid> napisał:
>>>>> > >>>
>>>>> > >>>> Cool! Especially the "shared" folder with the ability to have
>>>>> > >>>> N-combinations w/o exploding project repo root!
>>>>> > >>>>
>>>>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote:
>>>>> > >>>>> Oh, and all of this will be explain in shared/README.md
>>>>> > >>>>>
>>>>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org>
>>>>> wrote:
>>>>> > >>>>>>
>>>>> > >>>>>> Okay, so it seems we have agreement on the approach here, so
>>>>> I’ll
>>>>> > >>>> continue with this, and on the dev call it was mentioned that
>>>>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal
>>>>> for the
>>>>> > file
>>>>> > >>>> structure;
>>>>> > >>>>>> ```
>>>>> > >>>>>> /
>>>>> > >>>>>>   task-sdk/...
>>>>> > >>>>>>   airflow-core/...
>>>>> > >>>>>>   shared/
>>>>> > >>>>>>     kuberenetes/
>>>>> > >>>>>>       pyproject.toml
>>>>> > >>>>>>       src/
>>>>> > >>>>>>         airflow_kube/__init__.py
>>>>> > >>>>>>     core-and-tasksdk/
>>>>> > >>>>>>       pyproject.toml
>>>>> > >>>>>>       src/
>>>>> > >>>>>>         airflow_shared/__init__.py
>>>>> > >>>>>> ```
>>>>> > >>>>>>
>>>>> > >>>>>> Things to note here: the “shared” folder has (the
>>>>> possibility) of
>>>>> > >>>> having multiple different shared “libraries” in it, in this
>>>>> example I
>>>>> > am
>>>>> > >>>> supposing a hypothetical shared kuberenetes folder a world in
>>>>> which we
>>>>> > >>>> split the KubePodOperator and the KubeExecutor in to two
>>>>> separate
>>>>> > >>>> distributions (example only, not proposing we do that right
>>>>> now, and
>>>>> > that
>>>>> > >>>> will be a separate discussion)
>>>>> > >>>>>> The other things to note here:
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> - the folder name in shared aims to be “self-documenting”,
>>>>> hence the
>>>>> > >>>> verbose “core-and-tasksdk” to say where the shared library is
>>>>> > intended to
>>>>> > >>>> be used.
>>>>> > >>>>>> - the python module itself should almost always have an
>>>>> `airflow_`
>>>>> > (or
>>>>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with
>>>>> anything
>>>>> > >>> else
>>>>> > >>>> we might use. It won’t matter “in production” as those will be
>>>>> > vendored
>>>>> > >>> in
>>>>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but
>>>>> avoiding
>>>>> > >>>> conflicts at dev time with the Finder approach is a good safety
>>>>> > measure.
>>>>> > >>>>>> I will start making a real PR for this proposal now, but I’m
>>>>> open to
>>>>> > >>>> feedback (either here, or in the PR when I open it)
>>>>> > >>>>>> -ash
>>>>> > >>>>>>
>>>>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>> > >>>>>>>
>>>>> > >>>>>>> Yeah we have to try it and test - also building packages
>>>>> happens
>>>>> > semi
>>>>> > >>>>>>> frequently when you run `uv sync` (they use some kind of
>>>>> heuristics
>>>>> > >>> to
>>>>> > >>>>>>> decide when) and you can force it with `--reinstall` or
>>>>> > `--refresh`.
>>>>> > >>>>>>> Package build also happens every time when you run "ci-image
>>>>> build`
>>>>> > >>>> now in
>>>>> > >>>>>>> breeze so it seems like it will nicely integrate in our
>>>>> workflows.
>>>>> > >>>>>>>
>>>>> > >>>>>>> Looks really cool Ash.
>>>>> > >>>>>>>
>>>>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <
>>>>> a...@apache.org>
>>>>> > >>>> wrote:
>>>>> > >>>>>>>> It’s not just release time, but any time we build a package
>>>>> which
>>>>> > >>>> happens
>>>>> > >>>>>>>> on “every” CI run. The normal unit tests will use code from
>>>>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which
>>>>> build an
>>>>> > >>>> image will
>>>>> > >>>>>>>> build the dists and vendor in the code from that commit.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> There is only a single copy of the shared code committed to
>>>>> the
>>>>> > >>> repo,
>>>>> > >>>> so
>>>>> > >>>>>>>> there is never anything to synchronise.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <
>>>>> amoghdesai....@gmail.com>
>>>>> > >>>> wrote:
>>>>> > >>>>>>>>> Thanks Ash.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> This is really cool and helpful that you were able to test
>>>>> both
>>>>> > >>>> scenarios
>>>>> > >>>>>>>>> -- repo checkout
>>>>> > >>>>>>>>> and also installing from the vendored package and the
>>>>> resolution
>>>>> > >>>> worked
>>>>> > >>>>>>>>> fine too.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> I like this idea compared the to relative import one for
>>>>> few
>>>>> > >>> reasons:
>>>>> > >>>>>>>>> - It feels like it will take some time to adjust to the new
>>>>> > coding
>>>>> > >>>>>>>> standard
>>>>> > >>>>>>>>> that we will lay
>>>>> > >>>>>>>>> if we impose relative imports in the shared dist
>>>>> > >>>>>>>>> - We can continue using repo wise absolute import
>>>>> standards, it
>>>>> > is
>>>>> > >>>> also
>>>>> > >>>>>>>>> much easier for situations
>>>>> > >>>>>>>>> when we do global search in IDE to find + replace, this
>>>>> could
>>>>> > mean
>>>>> > >>> a
>>>>> > >>>>>>>> change
>>>>> > >>>>>>>>> there
>>>>> > >>>>>>>>> - The vendoring work is a proven and established paradigm
>>>>> across
>>>>> > >>>> projects
>>>>> > >>>>>>>>> and would
>>>>> > >>>>>>>>> out of box give us the build tooling we need also
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Nothing too against the relative import but with the
>>>>> evidence
>>>>> > >>>> provided
>>>>> > >>>>>>>>> above, vendored approach
>>>>> > >>>>>>>>> seems to only do us good.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Regarding synchronizing it, release time should be fine as
>>>>> long
>>>>> > as
>>>>> > >>> we
>>>>> > >>>>>>>> have
>>>>> > >>>>>>>>> a good CI workflow to probably
>>>>> > >>>>>>>>> catch such issues per PR if changes are made in shared
>>>>> dist?
>>>>> > >>>> (precommit
>>>>> > >>>>>>>>> would make it really slow i guess)
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> If we can run our tests with vendored code we should be
>>>>> mostly
>>>>> > >>>> covered.
>>>>> > >>>>>>>>> Good effort all!
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Thanks & Regards,
>>>>> > >>>>>>>>> Amogh Desai
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <
>>>>> > a...@apache.org>
>>>>> > >>>>>>>> wrote:
>>>>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m happy
>>>>> with.
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
>>>>> > >>>>>>>>>> This produces the following from `uv build task-sdk`
>>>>> > >>>>>>>>>> -
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
>>>>> > >>>>>>>>>> -
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
>>>>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>> ❯ unzip -l
>>>>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
>>>>> > >>>>>>>> grep
>>>>> > >>>>>>>>>> _vendor
>>>>> > >>>>>>>>>>      50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
>>>>> > >>>>>>>>>>    2082  02-02-2020 00:00
>>>>>  airflow/sdk/_vendor/__init__.py
>>>>> > >>>>>>>>>>      28  02-02-2020 00:00
>>>>> >  airflow/sdk/_vendor/airflow_common.pyi
>>>>> > >>>>>>>>>>      18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
>>>>> > >>>>>>>>>>     785  02-02-2020 00:00
>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
>>>>> > >>>>>>>>>>   10628  02-02-2020 00:00
>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete
>>>>> too:
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep
>>>>> > _vendor
>>>>> > >>>>>>>>>>
>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
>>>>> > >>>>>>>>>>
>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
>>>>> > >>>>>>>>>>
>>>>> > >>>>
>>>>> >
>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
>>>>> > >>>>>>>>>>
>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
>>>>> > >>>
>>>>> >
>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> The plugin works at build time by including/copying the
>>>>> libs
>>>>> > >>>> specified
>>>>> > >>>>>>>> in
>>>>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of
>>>>> import
>>>>> > >>>>>>>> rewrites.)
>>>>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from a
>>>>> repo
>>>>> > >>>> checkout,
>>>>> > >>>>>>>> I
>>>>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and since
>>>>> it’s at
>>>>> > >>> the
>>>>> > >>>>>>>> end of
>>>>> > >>>>>>>>>> the list it will only be used if the normal import can’t
>>>>> find
>>>>> > >>>> things.
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
>>>>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect “import
>>>>> > >>>> rewriting”
>>>>> > >>>>>>>>>> affect, as in this approach `airflow_common` is directly
>>>>> loaded
>>>>> > >>>> (i.e.
>>>>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common
>>>>> exist in
>>>>> > >>>>>>>>>> sys.modules), but it does work for everything that I was
>>>>> able to
>>>>> > >>>> test..
>>>>> > >>>>>>>>>> I tested it with the diff at the end of this message. My
>>>>> test
>>>>> > >>>> ipython
>>>>> > >>>>>>>>>> shell:
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone
>>>>> import
>>>>> > >>> foo
>>>>> > >>>>>>>>>> In [2]: foo
>>>>> > >>>>>>>>>> Out[2]: 1
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
>>>>> > >>>>>>>>>> Out[5]:
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
>>>>> > >>>>>>>>>> In [6]:
>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>> > >>>>>>>>>> Out[6]:
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> And in an standalone environment with the SDK dist I
>>>>> built (it
>>>>> > >>>> needed
>>>>> > >>>>>>>> the
>>>>> > >>>>>>>>>> matching airflow-core right now, but that is nothing to
>>>>> do with
>>>>> > >>> this
>>>>> > >>>>>>>>>> discussion):
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
>>>>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
>>>>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl
>>>>> ipython
>>>>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8
>>>>> ]
>>>>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more
>>>>> information
>>>>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?'
>>>>> for
>>>>> > >>> help.
>>>>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the options
>>>>> with
>>>>> > >>>>>>>> `%history?`
>>>>> > >>>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> In [2]:
>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>> > >>>>>>>>>> Out[2]:
>>>>> > >>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
>>>>> > >>>>>>>>>> ``
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> ```diff
>>>>> > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py
>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py
>>>>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644
>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py
>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
>>>>> > >>>>>>>>>> @@ -14,3 +14,5 @@
>>>>> > >>>>>>>>>> # KIND, either express or implied.  See the License for
>>>>> the
>>>>> > >>>>>>>>>> # specific language governing permissions and limitations
>>>>> > >>>>>>>>>> # under the License.
>>>>> > >>>>>>>>>> +
>>>>> > >>>>>>>>>> +foo = 1
>>>>> > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py
>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py
>>>>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644
>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py
>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
>>>>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
>>>>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3
>>>>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
>>>>> > >>>>>>>>>> utc = pendulum.UTC
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> +
>>>>> > >>>>>>>>>> +from airflow_common import foo
>>>>> > >>>>>>>>>> +
>>>>> > >>>>>>>>>> TIMEZONE: Timezone
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> ```
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com
>>>>> >
>>>>> > wrote:
>>>>> > >>>>>>>>>>> I think both approaches are doable:
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by
>>>>> pre-commit when
>>>>> > >>>>>>>> importing
>>>>> > >>>>>>>>>>> from different distributions and make sure we are only
>>>>> doing
>>>>> > >>>> relative
>>>>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of
>>>>> this
>>>>> > >>>> already. And
>>>>> > >>>>>>>>>> yes
>>>>> > >>>>>>>>>>> it would require relative links we currently do not
>>>>> allow.
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point in
>>>>> time
>>>>> > >>> will
>>>>> > >>>> have
>>>>> > >>>>>>>>>> to
>>>>> > >>>>>>>>>>> decide to synchronize this and if it happens just before
>>>>> > release
>>>>> > >>>> (I bet
>>>>> > >>>>>>>>>>> this is going to happen) this will lead to solving
>>>>> problems
>>>>> > that
>>>>> > >>>> would
>>>>> > >>>>>>>>>>> normally be solved during PR when you make a change (i.e.
>>>>> > >>> symbolic
>>>>> > >>>> link
>>>>> > >>>>>>>>>> has
>>>>> > >>>>>>>>>>> the advantage that whoever modifies shared code will be
>>>>> > >>> immediately
>>>>> > >>>>>>>>>>> notified in their PR - that they broke something because
>>>>> either
>>>>> > >>>> static
>>>>> > >>>>>>>>>>> checks or mypy or tests fail.
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when)
>>>>> does the
>>>>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could
>>>>> solve it
>>>>> > if
>>>>> > >>>> it is
>>>>> > >>>>>>>>>> done
>>>>> > >>>>>>>>>>> more frequently and with some regularity? We could
>>>>> potentially
>>>>> > >>>> force
>>>>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code
>>>>> changes
>>>>> > (and
>>>>> > >>>>>>>> prevent
>>>>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place (other
>>>>> than
>>>>> > >>>> releases)
>>>>> > >>>>>>>>>> in
>>>>> > >>>>>>>>>>> our development workflow and that seems to be a bit too
>>>>> late as
>>>>> > >>>> puts an
>>>>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities
>>>>> introduced
>>>>> > on
>>>>> > >>>>>>>> release
>>>>> > >>>>>>>>>>> manager and delays the release. WDYT?
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> Re: relative links. I think for a shared library we could
>>>>> > >>>> potentially
>>>>> > >>>>>>>>>> relax
>>>>> > >>>>>>>>>>> this and allow them (and actually disallow absolute
>>>>> links in
>>>>> > the
>>>>> > >>>> pieces
>>>>> > >>>>>>>>>> of
>>>>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I
>>>>> recall, the
>>>>> > >>> only
>>>>> > >>>>>>>>>> reason
>>>>> > >>>>>>>>>>> we forbade the relative links is because of how we are
>>>>> (or
>>>>> > maybe
>>>>> > >>>> were)
>>>>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So we
>>>>> decided
>>>>> > >>> to
>>>>> > >>>> just
>>>>> > >>>>>>>>>> not
>>>>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing
>>>>> works is
>>>>> > >>> that
>>>>> > >>>>>>>> when
>>>>> > >>>>>>>>>>> you are using importlib to read the Dag from a file, the
>>>>> > relative
>>>>> > >>>>>>>> imports
>>>>> > >>>>>>>>>>> do not work as it does not know what they should be
>>>>> relative
>>>>> > to.
>>>>> > >>>> But if
>>>>> > >>>>>>>>>>> relative import is done from an imported package, it
>>>>> should be
>>>>> > no
>>>>> > >>>>>>>>>> problem,
>>>>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to import
>>>>> any
>>>>> > >>>> library
>>>>> > >>>>>>>> that
>>>>> > >>>>>>>>>>> uses relative imports.
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> Of course consistency might be the reason why we do not
>>>>> want to
>>>>> > >>>>>>>> introduce
>>>>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is
>>>>> guarded
>>>>> > by
>>>>> > >>>>>>>>>> pre-commit
>>>>> > >>>>>>>>>>> though.
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> J.
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> J.
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
>>>>> > >>>> a...@apache.org>
>>>>> > >>>>>>>>>>> napisał:
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It
>>>>> does mean
>>>>> > >>> you
>>>>> > >>>>>>>> can’t
>>>>> > >>>>>>>>>>>> use absolute imports in one common module to another.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> For example
>>>>> > >>>>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>>>>> > >>>>>>>>>>>> where we have
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> ```
>>>>> > >>>>>>>>>>>> from airflow.utils.module_loading import import_string
>>>>> > >>>>>>>>>>>> ```
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib that
>>>>> is
>>>>> > then
>>>>> > >>>>>>>>>> symlinked
>>>>> > >>>>>>>>>>>> then we wouldn’t be able to have `from
>>>>> > >>>> airflow_common.module_loading
>>>>> > >>>>>>>>>> import
>>>>> > >>>>>>>>>>>> import_string`.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> I can think of two possible solutions here.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this shared
>>>>> lib, so
>>>>> > >>>> `from
>>>>> > >>>>>>>>>>>> .module_loading import import_string`
>>>>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers)
>>>>> which
>>>>> > >>> will
>>>>> > >>>>>>>> handle
>>>>> > >>>>>>>>>>>> import-rewriting for us.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing,
>>>>> so I
>>>>> > >>>> prepared
>>>>> > >>>>>>>> a
>>>>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could look
>>>>> like
>>>>> > here
>>>>> > >>>>>>>>>>>>
>>>>> > >>>
>>>>> >
>>>>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>>>>> > >>>>>>>>>>>> Now personally I am more than happy with relative
>>>>> imports, but
>>>>> > >>>>>>>> generally
>>>>> > >>>>>>>>>>>> as a project we have avoided them, so I think that
>>>>> limits what
>>>>> > >>> we
>>>>> > >>>>>>>> could
>>>>> > >>>>>>>>>> do
>>>>> > >>>>>>>>>>>> with a symlink based approach.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> -ash
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
>>>>> > >>>>>>>> gopidesupa...@gmail.com>
>>>>> > >>>>>>>>>>>> wrote:
>>>>> > >>>>>>>>>>>>> Thanks Ash
>>>>> > >>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making
>>>>> sure we
>>>>> > >>>> have all
>>>>> > >>>>>>>>>> the
>>>>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code
>>>>> sharing
>>>>> > >>> and
>>>>> > >>>>>>>>>>>> executing
>>>>> > >>>>>>>>>>>>> right of tests between the packages.
>>>>> > >>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>> Agree with others, option 2 would be
>>>>> > >>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
>>>>> > >>>>>>>> amoghdesai....@gmail.com>
>>>>> > >>>>>>>>>>>>> wrote:
>>>>> > >>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash.
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to
>>>>> handle
>>>>> > the
>>>>> > >>>> code
>>>>> > >>>>>>>>>>>>>> duplication at *release* time.
>>>>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in
>>>>> itself.
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we
>>>>> decide
>>>>> > >>> to
>>>>> > >>>>>>>>>> separate
>>>>> > >>>>>>>>>>>>>> triggerer /
>>>>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate packages,
>>>>> back
>>>>> > >>>> compat is
>>>>> > >>>>>>>>>>>> going
>>>>> > >>>>>>>>>>>>>> to be a nightmare
>>>>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate.
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> Thanks & Regards,
>>>>> > >>>>>>>>>>>>>> Amogh Desai
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <
>>>>> > >>> kaxiln...@gmail.com>
>>>>> > >>>>>>>>>> wrote:
>>>>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of
>>>>> dependencies
>>>>> > >>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
>>>>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid
>>>>> > >>>>>>>>>>>>>>> wrote:
>>>>> > >>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it
>>>>> is
>>>>> > >>> rather
>>>>> > >>>>>>>>>>>> pragmatic
>>>>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and have
>>>>> less
>>>>> > >>>> package
>>>>> > >>>>>>>>>> counts
>>>>> > >>>>>>>>>>>>>>>> and dependencies.
>>>>> > >>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together
>>>>> with
>>>>> > >>> Jarek,
>>>>> > >>>> I am
>>>>> > >>>>>>>>>> not
>>>>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be
>>>>> > possible.
>>>>> > >>>> To
>>>>> > >>>>>>>> keep
>>>>> > >>>>>>>>>>>>>> the
>>>>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into
>>>>> another. If
>>>>> > then
>>>>> > >>>> at
>>>>> > >>>>>>>>>> point
>>>>> > >>>>>>>>>>>>>> of
>>>>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we
>>>>> have 1 set
>>>>> > >>> of
>>>>> > >>>>>>>> code.
>>>>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy
>>>>> could be
>>>>> > >>>> solved by
>>>>> > >>>>>>>> a
>>>>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are
>>>>> de-duplicated
>>>>> > >>>> anyway
>>>>> > >>>>>>>>>> based
>>>>> > >>>>>>>>>>>>>> on
>>>>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt.
>>>>> > >>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI -
>>>>> the
>>>>> > >>>> reasonings
>>>>> > >>>>>>>>>>>>>> you've
>>>>> > >>>>>>>>>>>>>>>>> shown for that make sense to me.
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>> Shahar
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <
>>>>> > >>>> a...@apache.org
>>>>> > >>>>>>>>>>>>>>> wrote:
>>>>> > >>>>>>>>>>>>>>>>>> Hello everyone,
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level
>>>>> separation of
>>>>> > >>>> Task
>>>>> > >>>>>>>> SDK
>>>>> > >>>>>>>>>>>>>> and
>>>>> > >>>>>>>>>>>>>>>> Core
>>>>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some
>>>>> situations
>>>>> > where
>>>>> > >>> we
>>>>> > >>>>>>>> would
>>>>> > >>>>>>>>>>>>>>> like
>>>>> > >>>>>>>>>>>>>>>> to
>>>>> > >>>>>>>>>>>>>>>>>> share code between these.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put
>>>>> it in
>>>>> > a
>>>>> > >>>> common
>>>>> > >>>>>>>>>>>>>> dist
>>>>> > >>>>>>>>>>>>>>>> they
>>>>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the
>>>>> Task
>>>>> > SDK
>>>>> > >>>>>>>>>>>>>> separation
>>>>> > >>>>>>>>>>>>>>>> was
>>>>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence
>>>>> between the
>>>>> > >>> two,
>>>>> > >>>>>>>>>> ideally
>>>>> > >>>>>>>>>>>>>>>> even if
>>>>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. Most
>>>>> of the
>>>>> > >>>> reason
>>>>> > >>>>>>>>>> why
>>>>> > >>>>>>>>>>>>>>> this
>>>>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards
>>>>> > >>>> compatibility -
>>>>> > >>>>>>>> if
>>>>> > >>>>>>>>>> we
>>>>> > >>>>>>>>>>>>>>>> make
>>>>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in
>>>>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545
>>>>> (but
>>>>> > that
>>>>> > >>>> covers
>>>>> > >>>>>>>>>>>>>> some
>>>>> > >>>>>>>>>>>>>>>> more
>>>>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this
>>>>> discussion
>>>>> > >>>> such as
>>>>> > >>>>>>>>>>>>>>>> possibly
>>>>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a single
>>>>> > >>> provider
>>>>> > >>>>>>>> dist.)
>>>>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would
>>>>> like to
>>>>> > >>>> share
>>>>> > >>>
>>>>> >
>>>>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to
>>>>> share
>>>>> > will
>>>>> > >>>> be
>>>>> > >>>>>>>> the
>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from
>>>>> airflow.configuration
>>>>> > (but
>>>>> > >>>>>>>> notably:
>>>>> > >>>>>>>>>>>>>>> only
>>>>> > >>>>>>>>>>>>>>>> the
>>>>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values,
>>>>> again,
>>>>> > lets
>>>>> > >>>> not
>>>>> > >>>>>>>>>> dwell
>>>>> > >>>>>>>>>>>>>>> on
>>>>> > >>>>>>>>>>>>>>>> the
>>>>> > >>>>>>>>>>>>>>>>>> specifics of that)
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue here
>>>>> for
>>>>> > >>>> discussion,
>>>>> > >>>>>>>>>>>>>>> broadly
>>>>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches:
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution
>>>>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have
>>>>> the
>>>>> > code
>>>>> > >>>> in one
>>>>> > >>>>>>>>>>>>>>> place.
>>>>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case of
>>>>> > Logging
>>>>> > >>>> config
>>>>> > >>>>>>>>>> or
>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards
>>>>> > compatibility
>>>>> > >>> is
>>>>> > >>>>>>>> much
>>>>> > >>>>>>>>>>>>>> much
>>>>> > >>>>>>>>>>>>>>>>>> harder.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code
>>>>> is
>>>>> > >>> released
>>>>> > >>>>>>>>>>>>>>>> with/embedded
>>>>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would
>>>>> contain
>>>>> > >>> the
>>>>> > >>>>>>>> right
>>>>> > >>>>>>>>>>>>>>>> version
>>>>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The
>>>>> > downside
>>>>> > >>> is
>>>>> > >>>>>>>> that
>>>>> > >>>>>>>>>>>>>>> either
>>>>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo,
>>>>> or
>>>>> > better
>>>>> > >>>> yet
>>>>> > >>>>>>>> it
>>>>> > >>>>>>>>>>>>>>> would
>>>>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some
>>>>> tooling
>>>>> > (TBD)
>>>>> > >>>> will
>>>>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at
>>>>> commit
>>>>> > >>>> time, or
>>>>> > >>>>>>>> my
>>>>> > >>>>>>>>>>>>>>>>>> preference, at release time.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very
>>>>> > strongly
>>>>> > >>>>>>>> leaning
>>>>> > >>>>>>>>>>>>>>>> towards
>>>>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the
>>>>> > >>> backwards
>>>>> > >>>>>>>>>>>>>>>> compatibility
>>>>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very
>>>>> quickly over
>>>>> > >>>> time
>>>>> > >>>>>>>> the
>>>>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be
>>>>> > >>>> unreasonable)
>>>>> > >>>>>>>>>> and I
>>>>> > >>>>>>>>>>>>>>>> don’t
>>>>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as we
>>>>> need
>>>>> > to
>>>>> > >>>> really
>>>>> > >>>>>>>>>>>>>>> deliver
>>>>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to
>>>>> delivery
>>>>> > >>> with
>>>>> > >>>>>>>>>> AIP-72.
>>>>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about this,
>>>>> I will
>>>>> > >>>> come up
>>>>> > >>>>>>>>>>>>>> with
>>>>> > >>>>>>>>>>>>>>> a
>>>>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will
>>>>> implement code
>>>>> > >>>> sharing
>>>>> > >>>>>>>>>> via
>>>>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of
>>>>> how I
>>>>> > can
>>>>> > >>>>>>>> achieve
>>>>> > >>>>>>>>>>>>>> this
>>>>> > >>>>>>>>>>>>>>>> so
>>>>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll
>>>>> work
>>>>> > there,
>>>>> > >>>> but
>>>>> > >>>>>>>> at
>>>>> > >>>>>>>>>>>>>>>> runtime
>>>>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at
>>>>> > >>> something
>>>>> > >>>>>>>> like
>>>>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we
>>>>> would end
>>>>> > up
>>>>> > >>>> with:
>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml
>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/src/
>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/
>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml
>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/src/
>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/
>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/src
>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/
>>>>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as
>>>>> deps
>>>>> > would
>>>>> > >>>> be
>>>>> > >>>>>>>>>>>>>> included
>>>>> > >>>>>>>>>>>>>>>> in
>>>>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed.
>>>>> > >>>>
>>>>> ---------------------------------------------------------------------
>>>>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>> > dev-unsubscr...@airflow.apache.org
>>>>> > >>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>> > >>> dev-h...@airflow.apache.org
>>>>> > >>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>
>>>>> > >>>
>>>>> ---------------------------------------------------------------------
>>>>> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> > >>>>>>>> For additional commands, e-mail:
>>>>> dev-h...@airflow.apache.org
>>>>> > >>>>>>>>
>>>>> > >>>>>>>>
>>>>> > >>>>>>
>>>>> > ---------------------------------------------------------------------
>>>>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> > >>>>>>
>>>>> > >>>>>
>>>>> ---------------------------------------------------------------------
>>>>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> > >>>>>
>>>>> > >>>>
>>>>> ---------------------------------------------------------------------
>>>>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >
>>>>> > >
>>>>> ---------------------------------------------------------------------
>>>>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> > > For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> > >
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> > For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> >
>>>>> >
>>>>>
>>>>

Reply via email to