We still will have to keep common-{compat/io/sql/...?} providers with
nested structure, as
they aren't "compat" provider, but "common-compat" provider - without the
nested name it
does not make sense IMO.

śr., 8 sty 2025 o 18:21 Jarek Potiuk <ja...@potiuk.com> napisał(a):

> Can you give an example of what might break why having
> `providers/aapche-beam/src/airflow/providers/apache/beam`?
>
> Nothing will break. It's just:
>
> * the code will have to be a little more complex as it will have to do some
> conditional writes of "-" "/"
> * there will be inconsistency in the depth of folders - outside it will be
> 1, inside it will be 2 (as it is in your example)/
> * it will be a bit more convention/ complex to limit related providers (say
> microsoft) - with the current scheme "providers/microsoft" is the directory
> containing all microsoft providers. If we change it to "-", you have to
> find all sub-directories following "microsoft-*" convention.
>
> I am not super-strong on it - we could do either, it's just my preference
> to use folders for grouping related things (as folders were designed for).
>
> J.
>
> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > > And we already have a number of mappings and conventions to handle
> that.
> > > For example provider I'd mapping to dirs (apache.beam -> apache/beam),
> > and
> > > 'apache-airflow-providers-apache-beam' as package na e  and
> > > airflow/providers/apache/beam as packages inside the distribution.
> Those
> > > will remain as they are - we cannot change them without breaking
> > > compatibility.
> >
> > Can you give an example of what might break why having
> > `providers/aapche-beam/src/airflow/providers/apache/beam`?
> >
> > -a
> >
> > > On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > > I think it will be better to keep it.
> > >
> > > The reason we have varying levels were to group things together -
> mainly
> > > Apache related providers, but also Microsoft.
> > >
> > > And we already have a number of mappings and conventions to handle
> that.
> > > For example provider I'd mapping to dirs (apache.beam -> apache/beam),
> > and
> > > 'apache-airflow-providers-apache-beam' as package na e  and
> > > airflow/providers/apache/beam as packages inside the distribution.
> Those
> > > will remain as they are - we cannot change them without breaking
> > > compatibility.
> > >
> > > So if we change it to a flat structure we will have some
> inconsistencies
> > -
> > > in some cases it will be single folder in others (packages) those will
> be
> > > two folders.
> > >
> > > I think it will be more harm than good if we get rid of the 'folder'
> > > structures - some of the code in breeze will have to treat those
> > > differently as well. Nothing extraordinary and very complex but more
> > > complex-ish than it should be - already on top of handling potentially
> > > nested folders
> > >
> > > So my preference would be to stay with apache/beam - it's just more
> > > consistently handling the case where provider packages can be one-level
> > > nested
> > >
> > > J
> > >
> > >
> > > wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org>
> > > napisał:
> > >
> > >> Good question. I always found it confusing to have some providers at
> > >> different level. Examples:
> > >> - "airbyte" in "providers" directory (I would qualify it as "regular"
> > >> provider)
> > >> - "hive" in "providers/apache"
> > >> - "amazon" in "providers" but which contains only one sub directory
> > "aws"
> > >>
> > >> I would be in favor of using "-" instead of "/" so that all providers
> > are
> > >> at the same level.
> > >>
> > >>
> > >> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
> > >>> +1 one to this on general terms, it will hopefully reduce a lot of
> the
> > >> boilerplate we need.
> > >>>
> > >>> As for the amazon/aws example specifically that does bring up a
> > >> question, should we have `/` or `-`.. to give some examples:
> > >>>
> > >>> cncf kubernetes: ./providers/cncf/kubernetes or
> > >> ./providers/cncf-kubernetes
> > >>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
> > >>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
> > >>>
> > >>> There is no requirement from python etc on one form or the other (as
> > >> it’s just a folder, not part of the module name), so it’s what ever
> > makes
> > >> most sense to us.
> > >>>
> > >>> Jarek and Dennis (and others): what are your preferences on these
> > styles?
> > >>>
> > >>> -ash
> > >>>
> > >>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >>>>
> > >>>> Oh. . And one other benefit of it will be that we will be able to
> get
> > >> rid
> > >>>> of about 40% of the "Providers Manager" code. Currently, in
> Providers
> > >>>> manager we have a lot of "ifs" that make it possible to use
> providers
> > >> in
> > >>>> breeze and local environment from the sources. In "production"
> > >> installation
> > >>>> we are using "get_provider_info"  entry points to discover providers
> > >> and
> > >>>> discover if provider is installed, but when you use current
> providers
> > >>>> installed in Breeze to inside "airflow", we rely on `provider.yaml`
> to
> > >> be
> > >>>> present in the "airflow.providers.PROVIDER_ID" path - so we
> > effectively
> > >>>> have two paths of discovering information about the providers
> > >> installed.
> > >>>>
> > >>>> After all providers are migrated to the new structure, all providers
> > >> are
> > >>>> separate "distributions" - and when you run `uv sync`  (which will
> > >> install
> > >>>> all providers thanks to workspace feature) or `pip install -e
> > >>>> ./providers/aws` (which you will have to do manually to work on the
> > >>>> provider - if you use `pip` rather than uv) - then we will not have
> to
> > >> use
> > >>>> the separate path to read provider.yaml, because the right
> entrypoint
> > >> for
> > >>>> the provider will be installed as well - so we will be able to get
> rid
> > >> of
> > >>>> quite some code that is currently only used in airflow development
> > >>>> environment.
> > >>>>
> > >>>> J.
> > >>>>
> > >>>>
> > >>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > >>>>
> > >>>>> Those are very good questions :)
> > >>>>>
> > >>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
> > >>>>> <ferru...@amazon.com.invalid> wrote:
> > >>>>>
> > >>>>>> To clarify that I understand your diagram correctly, let's say you
> > >> clone
> > >>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that the
> > >> AWS Glue
> > >>>>>> Hook which used to live at
> > >>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a
> random
> > >>>>>> example) will be located at
> > >>>>>>
> > >>
> >
> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
> > >>>>>> That feels unnecessarily repetitive to me, maybe it makes sense
> but
> > >> I'm
> > >>>>>> missing the context?
> > >>>>>>
> > >>>>>
> > >>>>> Yes - it means that there is this repetitiveness but for a good
> > >> reason -
> > >>>>> those two "amazon/aws" serve different purpose:
> > >>>>>
> > >>>>> * The first "providers/amazon/aws" is just where the whole provider
> > >>>>> "complete project" is stored - it's basically a directory where
> "aws
> > >>>>> provider" is stored, a convenient folder to locate it in, that
> makes
> > >> it
> > >>>>> separate from other providers
> > >>>>> * The second "src/airflow/providers/amazon/aws" - is the python
> > >>>>> package where the source files is stored - this is how (inside the
> > >>>>> sub-folder) you tell the actual python "import" to look for the
> > >> sources.
> > >>>>>
> > >>>>> .What really matters is that (eventually)
> > >>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
> > >> completely
> > >>>>> separate python project - a source of a "standalone" provider
> python
> > >>>>> project.
> > >>>>> There is a "pyproject.toml" file at the root of it and if you do
> this
> > >> (for
> > >>>>> example):
> > >>>>>
> > >>>>> cd providers/amazon/aws/
> > >>>>> uv sync
> > >>>>>
> > >>>>> And with it you will be able to work on AWS provider exclusively
> as a
> > >>>>> separate project (this is not yet complete with the move - tests
> are
> > >> not
> > >>>>> entirely possible to run today - but it will be possible as next
> step
> > >> - I
> > >>>>> explained it in
> > >>>>>
> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> > >>>>>
> > >>>>> This has a number of benefits, but one of them is that you will be
> > >> able to
> > >>>>> build provider by just running `build` command of your favourite
> > >>>>> PEP-standard compliant frontend:
> > >>>>>
> > >>>>> cd providers/amazon/aws/
> > >>>>> `uv build` (or `hatch build` or `poetry build` or `flit build`
> )....
> > >>>>>
> > >>>>> This will create  the provider package inside the `dist" folder. I
> > >> just
> > >>>>> did it in my PR with `uv` in the first "airbyte` project:
> > >>>>>
> > >>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
> > >>>>> Building source distribution...
> > >>>>> Building wheel from source distribution...
> > >>>>> Successfully built
> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
> > >>>>> Successfully built
> > >>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
> > >>>>>
> > >>>>> That's it. That also allows cases like installing provider packages
> > >> using
> > >>>>> git URLs - which I used earlier today to test if the incoming PR of
> > >>>>> pygments is actually solving the problem we had yesteday
> > >>>>> https://github.com/apache/airflow/pull/45416  (basically we just
> > >> make our
> > >>>>> provider packages "standard" python packages that all the tools
> > >> support.
> > >>>>> Anyone who would like to install a commit, hash or branch version
> of
> > >> the
> > >>>>> "airbyte" package from main version of Airflow repo will be able to
> > >> do:
> > >>>>>
> > >>>>> pip install "apache-airflow-providers-airbyte @ git+
> > >>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
> > >>>>>
> > >>>>> Currently in order to create the package we need to manually
> extract
> > >> the
> > >>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files
> > >>>>> (pyproject.toml, README.rst and few others) and only then we  build
> > >> the
> > >>>>> package. All this - copying file structure, creating new files,
> > >> running the
> > >>>>> build command after and finally deleting the copied files is now -
> > >>>>> dynamically and under-the-hood created by "breeze
> release-management
> > >>>>> prepare-provider-packages" command. With this change, the directory
> > >>>>> structure in `git` repo of ours is totally standard and allows us
> > (and
> > >>>>> anyone else) to build the package directly from it.
> > >>>>>
> > >>>>>
> > >>>>> And what is the plan for system tests?   As part of this
> > >> reorganization,
> > >>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?
> That
> > >> seems
> > >>>>>> more intuitive to me than their current location in
> > >>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
> > >>>>>>
> > >>>>>>
> > >>>>> Oh yeah - I missed that in the original structure as the "airbyte"
> > >>>>> provider (that I chose as first one) did not contain the "system"
> > >> tests -
> > >>>>> but one of the two providers after that i was planning to make sure
> > >> system
> > >>>>> tests are covered. They are supposed to be moved to "tests/system"
> of
> > >>>>> course - Elad had similar question and I also explained it in
> detail
> > >> in
> > >>>>>
> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> > >>>>>
> > >>>>>
> > >>>>> I hope it answers the questions. If not - I am happy to add more
> > >>>>> clarifications :)
> > >>>>>
> > >>>>>
> > >>>>>> J.
> > >>>>>>
> > >>>>>
> > >>>
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to