Looks good but I think we should find some better logical name for
core_and_sdk :)

pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
<j_scheff...@gmx.de.invalid> napisał:

> Cool! Especially the "shared" folder with the ability to have
> N-combinations w/o exploding project repo root!
>
> On 07.07.25 14:43, Ash Berlin-Taylor wrote:
> > Oh, and all of this will be explain in shared/README.md
> >
> >> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> wrote:
> >>
> >> Okay, so it seems we have agreement on the approach here, so I’ll
> continue with this, and on the dev call it was mentioned that
> “airflow-common” wasn’t a great name, so here is my proposal for the file
> structure;
> >>
> >> ```
> >> /
> >>   task-sdk/...
> >>   airflow-core/...
> >>   shared/
> >>     kuberenetes/
> >>       pyproject.toml
> >>       src/
> >>         airflow_kube/__init__.py
> >>     core-and-tasksdk/
> >>       pyproject.toml
> >>       src/
> >>         airflow_shared/__init__.py
> >> ```
> >>
> >> Things to note here: the “shared” folder has (the possibility) of
> having multiple different shared “libraries” in it, in this example I am
> supposing a hypothetical shared kuberenetes folder a world in which we
> split the KubePodOperator and the KubeExecutor in to two separate
> distributions (example only, not proposing we do that right now, and that
> will be a separate discussion)
> >>
> >> The other things to note here:
> >>
> >>
> >> - the folder name in shared aims to be “self-documenting”, hence the
> verbose “core-and-tasksdk” to say where the shared library is intended to
> be used.
> >> - the python module itself should almost always have an `airflow_` (or
> maybe `_airflow_`?) prefix so that it does not conflict with anything else
> we might use. It won’t matter “in production” as those will be vendored in
> to be imported as `airflow/_vendor/airflow_shared` etc, but avoiding
> conflicts at dev time with the Finder approach is a good safety measure.
> >>
> >> I will start making a real PR for this proposal now, but I’m open to
> feedback (either here, or in the PR when I open it)
> >>
> >> -ash
> >>
> >>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>> Yeah we have to try it and test - also building packages happens semi
> >>> frequently when you run `uv sync` (they use some kind of heuristics to
> >>> decide when) and you can force it with `--reinstall` or `--refresh`.
> >>> Package build also happens every time when you run "ci-image build`
> now in
> >>> breeze so it seems like it will nicely integrate in our workflows.
> >>>
> >>> Looks really cool Ash.
> >>>
> >>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>>
> >>>> It’s not just release time, but any time we build a package which
> happens
> >>>> on “every” CI run. The normal unit tests will use code from
> >>>> airflow-common/src/airflow_common; the kube tests which build an
> image will
> >>>> build the dists and vendor in the code from that commit.
> >>>>
> >>>> There is only a single copy of the shared code committed to the repo,
> so
> >>>> there is never anything to synchronise.
> >>>>
> >>>>> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com>
> wrote:
> >>>>>
> >>>>> Thanks Ash.
> >>>>>
> >>>>> This is really cool and helpful that you were able to test both
> scenarios
> >>>>> -- repo checkout
> >>>>> and also installing from the vendored package and the resolution
> worked
> >>>>> fine too.
> >>>>>
> >>>>> I like this idea compared the to relative import one for few reasons:
> >>>>> - It feels like it will take some time to adjust to the new coding
> >>>> standard
> >>>>> that we will lay
> >>>>> if we impose relative imports in the shared dist
> >>>>> - We can continue using repo wise absolute import standards, it is
> also
> >>>>> much easier for situations
> >>>>> when we do global search in IDE to find + replace, this could mean a
> >>>> change
> >>>>> there
> >>>>> - The vendoring work is a proven and established paradigm across
> projects
> >>>>> and would
> >>>>> out of box give us the build tooling we need also
> >>>>>
> >>>>> Nothing too against the relative import but with the evidence
> provided
> >>>>> above, vendored approach
> >>>>> seems to only do us good.
> >>>>>
> >>>>> Regarding synchronizing it, release time should be fine as long as we
> >>>> have
> >>>>> a good CI workflow to probably
> >>>>> catch such issues per PR if changes are made in shared dist?
> (precommit
> >>>>> would make it really slow i guess)
> >>>>>
> >>>>> If we can run our tests with vendored code we should be mostly
> covered.
> >>>>>
> >>>>> Good effort all!
> >>>>>
> >>>>> Thanks & Regards,
> >>>>> Amogh Desai
> >>>>>
> >>>>>
> >>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org>
> >>>> wrote:
> >>>>>> Okay, I think I’ve got something that works and I’m happy with.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
> >>>>>> This produces the following from `uv build task-sdk`
> >>>>>> -
> >>>>>>
> >>>>
> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
> >>>>>> -
> >>>>>>
> >>>>
> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
> >>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
> >>>>>>
> >>>>>> ```
> >>>>>> ❯ unzip -l dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
> >>>> grep
> >>>>>> _vendor
> >>>>>>      50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
> >>>>>>    2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
> >>>>>>      28  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common.pyi
> >>>>>>      18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
> >>>>>>     785  02-02-2020 00:00
> >>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
> >>>>>>   10628  02-02-2020 00:00
> >>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
> >>>>>> ```
> >>>>>>
> >>>>>> And similarly in the .tar.gz, so our “sdist” is complete too:
> >>>>>> ```
> >>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor
> >>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
> >>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
> >>>>>>
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
> >>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
> >>>>>>
> >>>>>>
> >>>>
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
> >>>>>>
> >>>>
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
> >>>>>> ```
> >>>>>>
> >>>>>> The plugin works at build time by including/copying the libs
> specified
> >>>> in
> >>>>>> vendor.txt into place (and let `vendoring` take care of import
> >>>> rewrites.)
> >>>>>> For the imports to continue to work at “dev” time/from a repo
> checkout,
> >>>> I
> >>>>>> have added a import finder to `sys.meta_path`, and since it’s at the
> >>>> end of
> >>>>>> the list it will only be used if the normal import can’t find
> things.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
> >>>>>> This doesn’t quite give us the same runtime effect “import
> rewriting”
> >>>>>> affect, as in this approach `airflow_common` is directly loaded
> (i.e.
> >>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist in
> >>>>>> sys.modules), but it does work for everything that I was able to
> test..
> >>>>>>
> >>>>>> I tested it with the diff at the end of this message. My test
> ipython
> >>>>>> shell:
> >>>>>>
> >>>>>> ```
> >>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone import foo
> >>>>>>
> >>>>>> In [2]: foo
> >>>>>> Out[2]: 1
> >>>>>>
> >>>>>> In [3]: import airflow.sdk._vendor.airflow_common
> >>>>>>
> >>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone
> >>>>>>
> >>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
> >>>>>> Out[5]:
> >>>>>>
> >>>>
> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
> >>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
> >>>>>> Out[6]:
> >>>>>>
> >>>>
> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
> >>>>>> ```
> >>>>>>
> >>>>>>
> >>>>>> And in an standalone environment with the SDK dist I built (it
> needed
> >>>> the
> >>>>>> matching airflow-core right now, but that is nothing to do with this
> >>>>>> discussion):
> >>>>>>
> >>>>>> ```
> >>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
> >>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
> >>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
> >>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
> >>>>>> Type 'copyright', 'credits' or 'license' for more information
> >>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help.
> >>>>>> Tip: You can use `%hist` to view history, see the options with
> >>>> `%history?`
> >>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone
> >>>>>>
> >>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
> >>>>>> Out[2]:
> >>>>>>
> >>>>
> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
> >>>>>> ``
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ```diff
> >>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py
> >>>>>> b/airflow-common/src/airflow_common/__init__.py
> >>>>>> index 13a83393a9..927b7c6b61 100644
> >>>>>> --- a/airflow-common/src/airflow_common/__init__.py
> >>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
> >>>>>> @@ -14,3 +14,5 @@
> >>>>>> # KIND, either express or implied.  See the License for the
> >>>>>> # specific language governing permissions and limitations
> >>>>>> # under the License.
> >>>>>> +
> >>>>>> +foo = 1
> >>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py
> >>>>>> b/airflow-common/src/airflow_common/timezone.py
> >>>>>> index 340b924c66..58384ef20f 100644
> >>>>>> --- a/airflow-common/src/airflow_common/timezone.py
> >>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
> >>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
> >>>>>> version.parse(metadata.version("pendulum")).major == 3
> >>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
> >>>>>> utc = pendulum.UTC
> >>>>>>
> >>>>>> +
> >>>>>> +from airflow_common import foo
> >>>>>> +
> >>>>>> TIMEZONE: Timezone
> >>>>>>
> >>>>>>
> >>>>>> ```
> >>>>>>
> >>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>>>> I think both approaches are doable:
> >>>>>>>
> >>>>>>> 1) -> We can very easily prevent bad imports by pre-commit when
> >>>> importing
> >>>>>>> from different distributions and make sure we are only doing
> relative
> >>>>>>> imports in the shared modules. We are doing plenty of this
> already. And
> >>>>>> yes
> >>>>>>> it would require relative links we currently do not allow.
> >>>>>>>
> >>>>>>> 2) -> has one disadvantage that someone at some point in time will
> have
> >>>>>> to
> >>>>>>> decide to synchronize this and if it happens just before release
> (I bet
> >>>>>>> this is going to happen) this will lead to solving problems that
> would
> >>>>>>> normally be solved during PR when you make a change (i.e. symbolic
> link
> >>>>>> has
> >>>>>>> the advantage that whoever modifies shared code will be immediately
> >>>>>>> notified in their PR - that they broke something because either
> static
> >>>>>>> checks or mypy or tests fail.
> >>>>>>>
> >>>>>>> Ash, do you have an idea of a process (who and when) does the
> >>>>>>> synchronisation in case of vendoring? Maybe we could solve it if
> it is
> >>>>>> done
> >>>>>>> more frequently and with some regularity? We could potentially
> force
> >>>>>>> re-vendoring at PR time as well any time shared code changes (and
> >>>> prevent
> >>>>>>> it by pre-commit. And I can't think of some place (other than
> releases)
> >>>>>> in
> >>>>>>> our development workflow and that seems to be a bit too late as
> puts an
> >>>>>>> extra effort on fixing potential incompatibilities introduced on
> >>>> release
> >>>>>>> manager and delays the release. WDYT?
> >>>>>>>
> >>>>>>> Re: relative links. I think for a shared library we could
> potentially
> >>>>>> relax
> >>>>>>> this and allow them (and actually disallow absolute links in the
> pieces
> >>>>>> of
> >>>>>>> code that are shared - again, by pre-commit). As I recall, the only
> >>>>>> reason
> >>>>>>> we forbade the relative links is because of how we are (or maybe
> were)
> >>>>>>> doing DAG parsing and failures resulting from it. So we decided to
> just
> >>>>>> not
> >>>>>>> allow it to keep consistency. The way how Dag parsing works is that
> >>>> when
> >>>>>>> you are using importlib to read the Dag from a file, the relative
> >>>> imports
> >>>>>>> do not work as it does not know what they should be relative to.
> But if
> >>>>>>> relative import is done from an imported package, it should be no
> >>>>>> problem,
> >>>>>>> I think - otherwise our Dags would not be able to import any
> library
> >>>> that
> >>>>>>> uses relative imports.
> >>>>>>>
> >>>>>>> Of course consistency might be the reason why we do not want to
> >>>> introduce
> >>>>>>> relative imports. I don't see it as an issue if it is guarded by
> >>>>>> pre-commit
> >>>>>>> though.
> >>>>>>>
> >>>>>>> J.
> >>>>>>>
> >>>>>>>
> >>>>>>> J.
> >>>>>>>
> >>>>>>>
> >>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
> a...@apache.org>
> >>>>>>> napisał:
> >>>>>>>
> >>>>>>>> Oh yes, symlinks will work, with one big caveat: It does mean you
> >>>> can’t
> >>>>>>>> use absolute imports in one common module to another.
> >>>>>>>>
> >>>>>>>> For example
> >>>>>>>>
> >>>>
> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
> >>>>>>>> where we have
> >>>>>>>>
> >>>>>>>> ```
> >>>>>>>> from airflow.utils.module_loading import import_string
> >>>>>>>> ```
> >>>>>>>>
> >>>>>>>> if we want to move serve_logs into this common lib that is then
> >>>>>> symlinked
> >>>>>>>> then we wouldn’t be able to have `from
> airflow_common.module_loading
> >>>>>> import
> >>>>>>>> import_string`.
> >>>>>>>>
> >>>>>>>> I can think of two possible solutions here.
> >>>>>>>>
> >>>>>>>> 1) is to allow/require relative imports in this shared lib, so
> `from
> >>>>>>>> .module_loading import import_string`
> >>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which will
> >>>> handle
> >>>>>>>> import-rewriting for us.
> >>>>>>>>
> >>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so I
> prepared
> >>>> a
> >>>>>>>> minimal POC/demo of what vendoring approach could look like here
> >>>>>>>>
> >>>>
> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
> >>>>>>>> Now personally I am more than happy with relative imports, but
> >>>> generally
> >>>>>>>> as a project we have avoided them, so I think that limits what we
> >>>> could
> >>>>>> do
> >>>>>>>> with a symlink based approach.
> >>>>>>>>
> >>>>>>>> -ash
> >>>>>>>>
> >>>>>>>> [1] https://github.com/pradyunsg/vendoring
> >>>>>>>>
> >>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
> >>>> gopidesupa...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>> Thanks Ash
> >>>>>>>>>
> >>>>>>>>> Yes agree option 2 would be preferred for me. Making sure we
> have all
> >>>>>> the
> >>>>>>>>> gaurdriles to protect any unwanted behaviour in code sharing and
> >>>>>>>> executing
> >>>>>>>>> right of tests between the packages.
> >>>>>>>>>
> >>>>>>>>> Agree with others, option 2 would be
> >>>>>>>>>
> >>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
> >>>> amoghdesai....@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks for starting this discussion, Ash.
> >>>>>>>>>>
> >>>>>>>>>> I would prefer option 2 here with proper tooling to handle the
> code
> >>>>>>>>>> duplication at *release* time.
> >>>>>>>>>> It is best to have a dist that has all it needs in itself.
> >>>>>>>>>>
> >>>>>>>>>> Option 1 could very quickly get out of hand and if we decide to
> >>>>>> separate
> >>>>>>>>>> triggerer /
> >>>>>>>>>> dag processor / config etc etc as separate packages, back
> compat is
> >>>>>>>> going
> >>>>>>>>>> to be a nightmare
> >>>>>>>>>> and will bite us harder than we anticipate.
> >>>>>>>>>>
> >>>>>>>>>> Thanks & Regards,
> >>>>>>>>>> Amogh Desai
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
> >>>>>> <j_scheff...@gmx.de.invalid
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is rather
> >>>>>>>> pragmatic
> >>>>>>>>>>>> and we no not need to cut another package and have less
> package
> >>>>>> counts
> >>>>>>>>>>>> and dependencies.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I remember some time ago I was checking (together with Jarek,
> I am
> >>>>>> not
> >>>>>>>>>>>> sure anymore...) if the usage of symlinks would be possible.
> To
> >>>> keep
> >>>>>>>>>> the
> >>>>>>>>>>>> source in one package but "symlink" it into another. If then
> at
> >>>>>> point
> >>>>>>>>>> of
> >>>>>>>>>>>> packaging/release the files are materialized we have 1 set of
> >>>> code.
> >>>>>>>>>>>> Otherwise if not possible still the redundancy could be
> solved by
> >>>> a
> >>>>>>>>>>>> pre-commit hook - and in Git the files are de-duplicated
> anyway
> >>>>>> based
> >>>>>>>>>> on
> >>>>>>>>>>>> content hash, so this does not hurt.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
> >>>>>>>>>>>>> I support option 2 with proper automation & CI - the
> reasonings
> >>>>>>>>>> you've
> >>>>>>>>>>>>> shown for that make sense to me.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Shahar
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <
> a...@apache.org
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As we work on finishing off the code-level separation of
> Task
> >>>> SDK
> >>>>>>>>>> and
> >>>>>>>>>>>> Core
> >>>>>>>>>>>>>> (scheduler etc) we have come across some situations where we
> >>>> would
> >>>>>>>>>>> like
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> share code between these.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> However it’s not as straight forward of “just put it in a
> common
> >>>>>>>>>> dist
> >>>>>>>>>>>> they
> >>>>>>>>>>>>>> both depend upon” because one of the goals of the Task SDK
> >>>>>>>>>> separation
> >>>>>>>>>>>> was
> >>>>>>>>>>>>>> to have 100% complete version independence between the two,
> >>>>>> ideally
> >>>>>>>>>>>> even if
> >>>>>>>>>>>>>> they are built into the same image and venv. Most of the
> reason
> >>>>>> why
> >>>>>>>>>>> this
> >>>>>>>>>>>>>> isn’t straight forward comes down to backwards
> compatibility -
> >>>> if
> >>>>>> we
> >>>>>>>>>>>> make
> >>>>>>>>>>>>>> an change to the common/shared distribution
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We’ve listed the options we have thought about in
> >>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that
> covers
> >>>>>>>>>> some
> >>>>>>>>>>>> more
> >>>>>>>>>>>>>> things that I don’t want to get in to in this discussion
> such as
> >>>>>>>>>>>> possibly
> >>>>>>>>>>>>>> separating operators and executors out of a single provider
> >>>> dist.)
> >>>>>>>>>>>>>> To give a concrete example of some code I would like to
> share
> >>>>>>>>>>>>>>
> >>>>
> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
> >>>>>>>>>>>>>> — logging config. Another thing we will want to share will
> be
> >>>> the
> >>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration (but
> >>>> notably:
> >>>>>>>>>>> only
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> parser class, _not_ the default config values, again, lets
> not
> >>>>>> dwell
> >>>>>>>>>>> on
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> specifics of that)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So to bring the options listed in the issue here for
> discussion,
> >>>>>>>>>>> broadly
> >>>>>>>>>>>>>> speaking there are two high-level approaches:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. A single shared distribution
> >>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the code
> in one
> >>>>>>>>>>> place.
> >>>>>>>>>>>>>> However for me, at least in this specific case of Logging
> config
> >>>>>> or
> >>>>>>>>>>>>>> AirflowConfigParser class is that backwards compatibility is
> >>>> much
> >>>>>>>>>> much
> >>>>>>>>>>>>>> harder.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is released
> >>>>>>>>>>>> with/embedded
> >>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the
> >>>> right
> >>>>>>>>>>>> version
> >>>>>>>>>>>>>> of the logging config and ConfigParser etc). The downside is
> >>>> that
> >>>>>>>>>>> either
> >>>>>>>>>>>>>> the code will need to be duplicated in the repo, or better
> yet
> >>>> it
> >>>>>>>>>>> would
> >>>>>>>>>>>>>> live in a single place in the repo, but some tooling (TBD)
> will
> >>>>>>>>>>>>>> automatically handle the duplication, either at commit
> time, or
> >>>> my
> >>>>>>>>>>>>>> preference, at release time.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For this kind of shared “utility” code I am very strongly
> >>>> leaning
> >>>>>>>>>>>> towards
> >>>>>>>>>>>>>> option 2 with automation, as otherwise I think the backwards
> >>>>>>>>>>>> compatibility
> >>>>>>>>>>>>>> requirements would make it unworkable (very quickly over
> time
> >>>> the
> >>>>>>>>>>>>>> combinations we would have to test would just be
> unreasonable)
> >>>>>> and I
> >>>>>>>>>>>> don’t
> >>>>>>>>>>>>>> feel confident we can have things as stable as we need to
> really
> >>>>>>>>>>> deliver
> >>>>>>>>>>>>>> the version separation/independency I want to delivery with
> >>>>>> AIP-72.
> >>>>>>>>>>>>>> So unless someone feels very strongly about this, I will
> come up
> >>>>>>>>>> with
> >>>>>>>>>>> a
> >>>>>>>>>>>>>> draft PR for further discussion that will implement code
> sharing
> >>>>>> via
> >>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I can
> >>>> achieve
> >>>>>>>>>> this
> >>>>>>>>>>>> so
> >>>>>>>>>>>>>> we have a single version in the repo and it’ll work there,
> but
> >>>> at
> >>>>>>>>>>>> runtime
> >>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at something
> >>>> like
> >>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In terms of repo layout, this likely means we would end up
> with:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> airflow-core/pyproject.toml
> >>>>>>>>>>>>>> airflow-core/src/
> >>>>>>>>>>>>>> airflow-core/tests/
> >>>>>>>>>>>>>> task-sdk/pyproject.toml
> >>>>>>>>>>>>>> task-sdk/src/
> >>>>>>>>>>>>>> task-sdk/tests/
> >>>>>>>>>>>>>> airflow-common/src
> >>>>>>>>>>>>>> airflow-common/tests/
> >>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would
> be
> >>>>>>>>>> included
> >>>>>>>>>>>> in
> >>>>>>>>>>>>>> the downstream projects. TBD.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thoughts and feedback welcomed.
> >>>>>>>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to