How about splitting it even more and having each shared "thing" named? "logging", "config" and sharing them explicitly and separately with the right "user" ? That sounds way more modular and we will be able to choose which of the shared "utils" we use where.
J. On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler <j_scheff...@gmx.de.invalid> wrote: > I like "core_and_task_sdk the same like core-and-task-sdk - I have no > problem and it is a path only. > > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which is a > bit bulky we then should name it "all-not-api-server" :-D > > On 07.07.25 22:57, Ash Berlin-Taylor wrote: > > In case I did a bad job explaining it, the “core and task sdk” is not in > the module name/import name, just in the file path. > > > > Anyone have other ideas? > > > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> wrote: > >> > >> Thanks Ash! Looks cool! I like the structure. This will enable all the > >> combinations and structure looks easy to grasp. No strong stance on the > >> naming other than maybe it is a bit long with `and`, `core_ctl` could be > >> shorter, since no import path is defined like that, we can give any name > >> for sure. > >> > >> Best regards, > >> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote: > >>> > >>> Looks good but I think we should find some better logical name for > >>> core_and_sdk :) > >>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler > >>> <j_scheff...@gmx.de.invalid> napisał: > >>> > >>>> Cool! Especially the "shared" folder with the ability to have > >>>> N-combinations w/o exploding project repo root! > >>>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: > >>>>> Oh, and all of this will be explain in shared/README.md > >>>>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> wrote: > >>>>>> > >>>>>> Okay, so it seems we have agreement on the approach here, so I’ll > >>>> continue with this, and on the dev call it was mentioned that > >>>> “airflow-common” wasn’t a great name, so here is my proposal for the > file > >>>> structure; > >>>>>> ``` > >>>>>> / > >>>>>> task-sdk/... > >>>>>> airflow-core/... > >>>>>> shared/ > >>>>>> kuberenetes/ > >>>>>> pyproject.toml > >>>>>> src/ > >>>>>> airflow_kube/__init__.py > >>>>>> core-and-tasksdk/ > >>>>>> pyproject.toml > >>>>>> src/ > >>>>>> airflow_shared/__init__.py > >>>>>> ``` > >>>>>> > >>>>>> Things to note here: the “shared” folder has (the possibility) of > >>>> having multiple different shared “libraries” in it, in this example I > am > >>>> supposing a hypothetical shared kuberenetes folder a world in which we > >>>> split the KubePodOperator and the KubeExecutor in to two separate > >>>> distributions (example only, not proposing we do that right now, and > that > >>>> will be a separate discussion) > >>>>>> The other things to note here: > >>>>>> > >>>>>> > >>>>>> - the folder name in shared aims to be “self-documenting”, hence the > >>>> verbose “core-and-tasksdk” to say where the shared library is > intended to > >>>> be used. > >>>>>> - the python module itself should almost always have an `airflow_` > (or > >>>> maybe `_airflow_`?) prefix so that it does not conflict with anything > >>> else > >>>> we might use. It won’t matter “in production” as those will be > vendored > >>> in > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but avoiding > >>>> conflicts at dev time with the Finder approach is a good safety > measure. > >>>>>> I will start making a real PR for this proposal now, but I’m open to > >>>> feedback (either here, or in the PR when I open it) > >>>>>> -ash > >>>>>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote: > >>>>>>> > >>>>>>> Yeah we have to try it and test - also building packages happens > semi > >>>>>>> frequently when you run `uv sync` (they use some kind of heuristics > >>> to > >>>>>>> decide when) and you can force it with `--reinstall` or > `--refresh`. > >>>>>>> Package build also happens every time when you run "ci-image build` > >>>> now in > >>>>>>> breeze so it seems like it will nicely integrate in our workflows. > >>>>>>> > >>>>>>> Looks really cool Ash. > >>>>>>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org> > >>>> wrote: > >>>>>>>> It’s not just release time, but any time we build a package which > >>>> happens > >>>>>>>> on “every” CI run. The normal unit tests will use code from > >>>>>>>> airflow-common/src/airflow_common; the kube tests which build an > >>>> image will > >>>>>>>> build the dists and vendor in the code from that commit. > >>>>>>>> > >>>>>>>> There is only a single copy of the shared code committed to the > >>> repo, > >>>> so > >>>>>>>> there is never anything to synchronise. > >>>>>>>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com> > >>>> wrote: > >>>>>>>>> Thanks Ash. > >>>>>>>>> > >>>>>>>>> This is really cool and helpful that you were able to test both > >>>> scenarios > >>>>>>>>> -- repo checkout > >>>>>>>>> and also installing from the vendored package and the resolution > >>>> worked > >>>>>>>>> fine too. > >>>>>>>>> > >>>>>>>>> I like this idea compared the to relative import one for few > >>> reasons: > >>>>>>>>> - It feels like it will take some time to adjust to the new > coding > >>>>>>>> standard > >>>>>>>>> that we will lay > >>>>>>>>> if we impose relative imports in the shared dist > >>>>>>>>> - We can continue using repo wise absolute import standards, it > is > >>>> also > >>>>>>>>> much easier for situations > >>>>>>>>> when we do global search in IDE to find + replace, this could > mean > >>> a > >>>>>>>> change > >>>>>>>>> there > >>>>>>>>> - The vendoring work is a proven and established paradigm across > >>>> projects > >>>>>>>>> and would > >>>>>>>>> out of box give us the build tooling we need also > >>>>>>>>> > >>>>>>>>> Nothing too against the relative import but with the evidence > >>>> provided > >>>>>>>>> above, vendored approach > >>>>>>>>> seems to only do us good. > >>>>>>>>> > >>>>>>>>> Regarding synchronizing it, release time should be fine as long > as > >>> we > >>>>>>>> have > >>>>>>>>> a good CI workflow to probably > >>>>>>>>> catch such issues per PR if changes are made in shared dist? > >>>> (precommit > >>>>>>>>> would make it really slow i guess) > >>>>>>>>> > >>>>>>>>> If we can run our tests with vendored code we should be mostly > >>>> covered. > >>>>>>>>> Good effort all! > >>>>>>>>> > >>>>>>>>> Thanks & Regards, > >>>>>>>>> Amogh Desai > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor < > a...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>> Okay, I think I’ve got something that works and I’m happy with. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>> > https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core > >>>>>>>>>> This produces the following from `uv build task-sdk` > >>>>>>>>>> - > >>>>>>>>>> > >>> > https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz > >>>>>>>>>> - > >>>>>>>>>> > >>> > https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip) > >>>>>>>>>> > >>>>>>>>>> ``` > >>>>>>>>>> ❯ unzip -l > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | > >>>>>>>> grep > >>>>>>>>>> _vendor > >>>>>>>>>> 50 02-02-2020 00:00 airflow/sdk/_vendor/.gitignore > >>>>>>>>>> 2082 02-02-2020 00:00 airflow/sdk/_vendor/__init__.py > >>>>>>>>>> 28 02-02-2020 00:00 > airflow/sdk/_vendor/airflow_common.pyi > >>>>>>>>>> 18 02-02-2020 00:00 airflow/sdk/_vendor/vendor.txt > >>>>>>>>>> 785 02-02-2020 00:00 > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py > >>>>>>>>>> 10628 02-02-2020 00:00 > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py > >>>>>>>>>> ``` > >>>>>>>>>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete too: > >>>>>>>>>> ``` > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep > _vendor > >>>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore > >>>>>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py > >>>>>>>>>> > >>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi > >>>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt > >>>>>>>>>> > >>>>>>>>>> > >>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py > >>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py > >>>>>>>>>> ``` > >>>>>>>>>> > >>>>>>>>>> The plugin works at build time by including/copying the libs > >>>> specified > >>>>>>>> in > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of import > >>>>>>>> rewrites.) > >>>>>>>>>> For the imports to continue to work at “dev” time/from a repo > >>>> checkout, > >>>>>>>> I > >>>>>>>>>> have added a import finder to `sys.meta_path`, and since it’s at > >>> the > >>>>>>>> end of > >>>>>>>>>> the list it will only be used if the normal import can’t find > >>>> things. > >>>>>>>>>> > >>>>>>>>>> > >>> > https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py > >>>>>>>>>> This doesn’t quite give us the same runtime effect “import > >>>> rewriting” > >>>>>>>>>> affect, as in this approach `airflow_common` is directly loaded > >>>> (i.e. > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist in > >>>>>>>>>> sys.modules), but it does work for everything that I was able to > >>>> test.. > >>>>>>>>>> I tested it with the diff at the end of this message. My test > >>>> ipython > >>>>>>>>>> shell: > >>>>>>>>>> > >>>>>>>>>> ``` > >>>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone import > >>> foo > >>>>>>>>>> In [2]: foo > >>>>>>>>>> Out[2]: 1 > >>>>>>>>>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common > >>>>>>>>>> > >>>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone > >>>>>>>>>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ > >>>>>>>>>> Out[5]: > >>>>>>>>>> > >>> > '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' > >>>>>>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__ > >>>>>>>>>> Out[6]: > >>>>>>>>>> > >>> > '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' > >>>>>>>>>> ``` > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> And in an standalone environment with the SDK dist I built (it > >>>> needed > >>>>>>>> the > >>>>>>>>>> matching airflow-core right now, but that is nothing to do with > >>> this > >>>>>>>>>> discussion): > >>>>>>>>>> > >>>>>>>>>> ``` > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more information > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for > >>> help. > >>>>>>>>>> Tip: You can use `%hist` to view history, see the options with > >>>>>>>> `%history?` > >>>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone > >>>>>>>>>> > >>>>>>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__ > >>>>>>>>>> Out[2]: > >>>>>>>>>> > >>> > '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ > >>>>>>>>>> `` > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ```diff > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644 > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py > >>>>>>>>>> @@ -14,3 +14,5 @@ > >>>>>>>>>> # KIND, either express or implied. See the License for the > >>>>>>>>>> # specific language governing permissions and limitations > >>>>>>>>>> # under the License. > >>>>>>>>>> + > >>>>>>>>>> +foo = 1 > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py > >>>>>>>>>> index 340b924c66..58384ef20f 100644 > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 > >>>>>>>>>> utc = pendulum.UTC > >>>>>>>>>> > >>>>>>>>>> + > >>>>>>>>>> +from airflow_common import foo > >>>>>>>>>> + > >>>>>>>>>> TIMEZONE: Timezone > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ``` > >>>>>>>>>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> > wrote: > >>>>>>>>>>> I think both approaches are doable: > >>>>>>>>>>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by pre-commit when > >>>>>>>> importing > >>>>>>>>>>> from different distributions and make sure we are only doing > >>>> relative > >>>>>>>>>>> imports in the shared modules. We are doing plenty of this > >>>> already. And > >>>>>>>>>> yes > >>>>>>>>>>> it would require relative links we currently do not allow. > >>>>>>>>>>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point in time > >>> will > >>>> have > >>>>>>>>>> to > >>>>>>>>>>> decide to synchronize this and if it happens just before > release > >>>> (I bet > >>>>>>>>>>> this is going to happen) this will lead to solving problems > that > >>>> would > >>>>>>>>>>> normally be solved during PR when you make a change (i.e. > >>> symbolic > >>>> link > >>>>>>>>>> has > >>>>>>>>>>> the advantage that whoever modifies shared code will be > >>> immediately > >>>>>>>>>>> notified in their PR - that they broke something because either > >>>> static > >>>>>>>>>>> checks or mypy or tests fail. > >>>>>>>>>>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when) does the > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could solve it > if > >>>> it is > >>>>>>>>>> done > >>>>>>>>>>> more frequently and with some regularity? We could potentially > >>>> force > >>>>>>>>>>> re-vendoring at PR time as well any time shared code changes > (and > >>>>>>>> prevent > >>>>>>>>>>> it by pre-commit. And I can't think of some place (other than > >>>> releases) > >>>>>>>>>> in > >>>>>>>>>>> our development workflow and that seems to be a bit too late as > >>>> puts an > >>>>>>>>>>> extra effort on fixing potential incompatibilities introduced > on > >>>>>>>> release > >>>>>>>>>>> manager and delays the release. WDYT? > >>>>>>>>>>> > >>>>>>>>>>> Re: relative links. I think for a shared library we could > >>>> potentially > >>>>>>>>>> relax > >>>>>>>>>>> this and allow them (and actually disallow absolute links in > the > >>>> pieces > >>>>>>>>>> of > >>>>>>>>>>> code that are shared - again, by pre-commit). As I recall, the > >>> only > >>>>>>>>>> reason > >>>>>>>>>>> we forbade the relative links is because of how we are (or > maybe > >>>> were) > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So we decided > >>> to > >>>> just > >>>>>>>>>> not > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing works is > >>> that > >>>>>>>> when > >>>>>>>>>>> you are using importlib to read the Dag from a file, the > relative > >>>>>>>> imports > >>>>>>>>>>> do not work as it does not know what they should be relative > to. > >>>> But if > >>>>>>>>>>> relative import is done from an imported package, it should be > no > >>>>>>>>>> problem, > >>>>>>>>>>> I think - otherwise our Dags would not be able to import any > >>>> library > >>>>>>>> that > >>>>>>>>>>> uses relative imports. > >>>>>>>>>>> > >>>>>>>>>>> Of course consistency might be the reason why we do not want to > >>>>>>>> introduce > >>>>>>>>>>> relative imports. I don't see it as an issue if it is guarded > by > >>>>>>>>>> pre-commit > >>>>>>>>>>> though. > >>>>>>>>>>> > >>>>>>>>>>> J. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> J. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < > >>>> a...@apache.org> > >>>>>>>>>>> napisał: > >>>>>>>>>>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It does mean > >>> you > >>>>>>>> can’t > >>>>>>>>>>>> use absolute imports in one common module to another. > >>>>>>>>>>>> > >>>>>>>>>>>> For example > >>>>>>>>>>>> > >>> > https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 > >>>>>>>>>>>> where we have > >>>>>>>>>>>> > >>>>>>>>>>>> ``` > >>>>>>>>>>>> from airflow.utils.module_loading import import_string > >>>>>>>>>>>> ``` > >>>>>>>>>>>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib that is > then > >>>>>>>>>> symlinked > >>>>>>>>>>>> then we wouldn’t be able to have `from > >>>> airflow_common.module_loading > >>>>>>>>>> import > >>>>>>>>>>>> import_string`. > >>>>>>>>>>>> > >>>>>>>>>>>> I can think of two possible solutions here. > >>>>>>>>>>>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this shared lib, so > >>>> `from > >>>>>>>>>>>> .module_loading import import_string` > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which > >>> will > >>>>>>>> handle > >>>>>>>>>>>> import-rewriting for us. > >>>>>>>>>>>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so I > >>>> prepared > >>>>>>>> a > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could look like > here > >>>>>>>>>>>> > >>> > https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 > >>>>>>>>>>>> Now personally I am more than happy with relative imports, but > >>>>>>>> generally > >>>>>>>>>>>> as a project we have avoided them, so I think that limits what > >>> we > >>>>>>>> could > >>>>>>>>>> do > >>>>>>>>>>>> with a symlink based approach. > >>>>>>>>>>>> > >>>>>>>>>>>> -ash > >>>>>>>>>>>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring > >>>>>>>>>>>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < > >>>>>>>> gopidesupa...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> Thanks Ash > >>>>>>>>>>>>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making sure we > >>>> have all > >>>>>>>>>> the > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code sharing > >>> and > >>>>>>>>>>>> executing > >>>>>>>>>>>>> right of tests between the packages. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Agree with others, option 2 would be > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < > >>>>>>>> amoghdesai....@gmail.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to handle > the > >>>> code > >>>>>>>>>>>>>> duplication at *release* time. > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in itself. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we decide > >>> to > >>>>>>>>>> separate > >>>>>>>>>>>>>> triggerer / > >>>>>>>>>>>>>> dag processor / config etc etc as separate packages, back > >>>> compat is > >>>>>>>>>>>> going > >>>>>>>>>>>>>> to be a nightmare > >>>>>>>>>>>>>> and will bite us harder than we anticipate. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks & Regards, > >>>>>>>>>>>>>> Amogh Desai > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < > >>> kaxiln...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler > >>>>>>>>>> <j_scheff...@gmx.de.invalid > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is > >>> rather > >>>>>>>>>>>> pragmatic > >>>>>>>>>>>>>>>> and we no not need to cut another package and have less > >>>> package > >>>>>>>>>> counts > >>>>>>>>>>>>>>>> and dependencies. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together with > >>> Jarek, > >>>> I am > >>>>>>>>>> not > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be > possible. > >>>> To > >>>>>>>> keep > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> source in one package but "symlink" it into another. If > then > >>>> at > >>>>>>>>>> point > >>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>> packaging/release the files are materialized we have 1 set > >>> of > >>>>>>>> code. > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy could be > >>>> solved by > >>>>>>>> a > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are de-duplicated > >>>> anyway > >>>>>>>>>> based > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>> content hash, so this does not hurt. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - the > >>>> reasonings > >>>>>>>>>>>>>> you've > >>>>>>>>>>>>>>>>> shown for that make sense to me. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Shahar > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor < > >>>> a...@apache.org > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> Hello everyone, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level separation of > >>>> Task > >>>>>>>> SDK > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>> Core > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some situations > where > >>> we > >>>>>>>> would > >>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> share code between these. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put it in > a > >>>> common > >>>>>>>>>>>>>> dist > >>>>>>>>>>>>>>>> they > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the Task > SDK > >>>>>>>>>>>>>> separation > >>>>>>>>>>>>>>>> was > >>>>>>>>>>>>>>>>>> to have 100% complete version independence between the > >>> two, > >>>>>>>>>> ideally > >>>>>>>>>>>>>>>> even if > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. Most of the > >>>> reason > >>>>>>>>>> why > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards > >>>> compatibility - > >>>>>>>> if > >>>>>>>>>> we > >>>>>>>>>>>>>>>> make > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but > that > >>>> covers > >>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>> more > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this discussion > >>>> such as > >>>>>>>>>>>>>>>> possibly > >>>>>>>>>>>>>>>>>> separating operators and executors out of a single > >>> provider > >>>>>>>> dist.) > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would like to > >>>> share > >>> > https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to share > will > >>>> be > >>>>>>>> the > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration > (but > >>>>>>>> notably: > >>>>>>>>>>>>>>> only > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, again, > lets > >>>> not > >>>>>>>>>> dwell > >>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> specifics of that) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue here for > >>>> discussion, > >>>>>>>>>>>>>>> broadly > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the > code > >>>> in one > >>>>>>>>>>>>>>> place. > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case of > Logging > >>>> config > >>>>>>>>>> or > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards > compatibility > >>> is > >>>>>>>> much > >>>>>>>>>>>>>> much > >>>>>>>>>>>>>>>>>> harder. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is > >>> released > >>>>>>>>>>>>>>>> with/embedded > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain > >>> the > >>>>>>>> right > >>>>>>>>>>>>>>>> version > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The > downside > >>> is > >>>>>>>> that > >>>>>>>>>>>>>>> either > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo, or > better > >>>> yet > >>>>>>>> it > >>>>>>>>>>>>>>> would > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some tooling > (TBD) > >>>> will > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at commit > >>>> time, or > >>>>>>>> my > >>>>>>>>>>>>>>>>>> preference, at release time. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very > strongly > >>>>>>>> leaning > >>>>>>>>>>>>>>>> towards > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the > >>> backwards > >>>>>>>>>>>>>>>> compatibility > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very quickly over > >>>> time > >>>>>>>> the > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be > >>>> unreasonable) > >>>>>>>>>> and I > >>>>>>>>>>>>>>>> don’t > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as we need > to > >>>> really > >>>>>>>>>>>>>>> deliver > >>>>>>>>>>>>>>>>>> the version separation/independency I want to delivery > >>> with > >>>>>>>>>> AIP-72. > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about this, I will > >>>> come up > >>>>>>>>>>>>>> with > >>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will implement code > >>>> sharing > >>>>>>>>>> via > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I > can > >>>>>>>> achieve > >>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll work > there, > >>>> but > >>>>>>>> at > >>>>>>>>>>>>>>>> runtime > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at > >>> something > >>>>>>>> like > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we would end > up > >>>> with: > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml > >>>>>>>>>>>>>>>>>> airflow-core/src/ > >>>>>>>>>>>>>>>>>> airflow-core/tests/ > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml > >>>>>>>>>>>>>>>>>> task-sdk/src/ > >>>>>>>>>>>>>>>>>> task-sdk/tests/ > >>>>>>>>>>>>>>>>>> airflow-common/src > >>>>>>>>>>>>>>>>>> airflow-common/tests/ > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps > would > >>>> be > >>>>>>>>>>>>>> included > >>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>> the downstream projects. TBD. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. > >>>> --------------------------------------------------------------------- > >>>>>>>>>>>>>>>> To unsubscribe, e-mail: > dev-unsubscr...@airflow.apache.org > >>>>>>>>>>>>>>>> For additional commands, e-mail: > >>> dev-h...@airflow.apache.org > >>>>>>>>>>>>>>>> > >>>>>>>> > >>> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>>>>> > >>>>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>> > >>>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >