We still will have to keep common-{compat/io/sql/...?} providers with nested structure, as they aren't "compat" provider, but "common-compat" provider - without the nested name it does not make sense IMO.
śr., 8 sty 2025 o 18:21 Jarek Potiuk <ja...@potiuk.com> napisał(a): > Can you give an example of what might break why having > `providers/aapche-beam/src/airflow/providers/apache/beam`? > > Nothing will break. It's just: > > * the code will have to be a little more complex as it will have to do some > conditional writes of "-" "/" > * there will be inconsistency in the depth of folders - outside it will be > 1, inside it will be 2 (as it is in your example)/ > * it will be a bit more convention/ complex to limit related providers (say > microsoft) - with the current scheme "providers/microsoft" is the directory > containing all microsoft providers. If we change it to "-", you have to > find all sub-directories following "microsoft-*" convention. > > I am not super-strong on it - we could do either, it's just my preference > to use folders for grouping related things (as folders were designed for). > > J. > > On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > > And we already have a number of mappings and conventions to handle > that. > > > For example provider I'd mapping to dirs (apache.beam -> apache/beam), > > and > > > 'apache-airflow-providers-apache-beam' as package na e and > > > airflow/providers/apache/beam as packages inside the distribution. > Those > > > will remain as they are - we cannot change them without breaking > > > compatibility. > > > > Can you give an example of what might break why having > > `providers/aapche-beam/src/airflow/providers/apache/beam`? > > > > -a > > > > > On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > I think it will be better to keep it. > > > > > > The reason we have varying levels were to group things together - > mainly > > > Apache related providers, but also Microsoft. > > > > > > And we already have a number of mappings and conventions to handle > that. > > > For example provider I'd mapping to dirs (apache.beam -> apache/beam), > > and > > > 'apache-airflow-providers-apache-beam' as package na e and > > > airflow/providers/apache/beam as packages inside the distribution. > Those > > > will remain as they are - we cannot change them without breaking > > > compatibility. > > > > > > So if we change it to a flat structure we will have some > inconsistencies > > - > > > in some cases it will be single folder in others (packages) those will > be > > > two folders. > > > > > > I think it will be more harm than good if we get rid of the 'folder' > > > structures - some of the code in breeze will have to treat those > > > differently as well. Nothing extraordinary and very complex but more > > > complex-ish than it should be - already on top of handling potentially > > > nested folders > > > > > > So my preference would be to stay with apache/beam - it's just more > > > consistently handling the case where provider packages can be one-level > > > nested > > > > > > J > > > > > > > > > wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> > > > napisał: > > > > > >> Good question. I always found it confusing to have some providers at > > >> different level. Examples: > > >> - "airbyte" in "providers" directory (I would qualify it as "regular" > > >> provider) > > >> - "hive" in "providers/apache" > > >> - "amazon" in "providers" but which contains only one sub directory > > "aws" > > >> > > >> I would be in favor of using "-" instead of "/" so that all providers > > are > > >> at the same level. > > >> > > >> > > >> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: > > >>> +1 one to this on general terms, it will hopefully reduce a lot of > the > > >> boilerplate we need. > > >>> > > >>> As for the amazon/aws example specifically that does bring up a > > >> question, should we have `/` or `-`.. to give some examples: > > >>> > > >>> cncf kubernetes: ./providers/cncf/kubernetes or > > >> ./providers/cncf-kubernetes > > >>> Apache hive: ./providers/apache/hive or ./providers/apache-hive > > >>> AWS: ./providers/amazon/aws or ./providers/amazon-aws > > >>> > > >>> There is no requirement from python etc on one form or the other (as > > >> it’s just a folder, not part of the module name), so it’s what ever > > makes > > >> most sense to us. > > >>> > > >>> Jarek and Dennis (and others): what are your preferences on these > > styles? > > >>> > > >>> -ash > > >>> > > >>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: > > >>>> > > >>>> Oh. . And one other benefit of it will be that we will be able to > get > > >> rid > > >>>> of about 40% of the "Providers Manager" code. Currently, in > Providers > > >>>> manager we have a lot of "ifs" that make it possible to use > providers > > >> in > > >>>> breeze and local environment from the sources. In "production" > > >> installation > > >>>> we are using "get_provider_info" entry points to discover providers > > >> and > > >>>> discover if provider is installed, but when you use current > providers > > >>>> installed in Breeze to inside "airflow", we rely on `provider.yaml` > to > > >> be > > >>>> present in the "airflow.providers.PROVIDER_ID" path - so we > > effectively > > >>>> have two paths of discovering information about the providers > > >> installed. > > >>>> > > >>>> After all providers are migrated to the new structure, all providers > > >> are > > >>>> separate "distributions" - and when you run `uv sync` (which will > > >> install > > >>>> all providers thanks to workspace feature) or `pip install -e > > >>>> ./providers/aws` (which you will have to do manually to work on the > > >>>> provider - if you use `pip` rather than uv) - then we will not have > to > > >> use > > >>>> the separate path to read provider.yaml, because the right > entrypoint > > >> for > > >>>> the provider will be installed as well - so we will be able to get > rid > > >> of > > >>>> quite some code that is currently only used in airflow development > > >>>> environment. > > >>>> > > >>>> J. > > >>>> > > >>>> > > >>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> > > wrote: > > >>>> > > >>>>> Those are very good questions :) > > >>>>> > > >>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > > >>>>> <ferru...@amazon.com.invalid> wrote: > > >>>>> > > >>>>>> To clarify that I understand your diagram correctly, let's say you > > >> clone > > >>>>>> the Airflow repo to ~/workspace/airflow/. Does this mean that the > > >> AWS Glue > > >>>>>> Hook which used to live at > > >>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a > random > > >>>>>> example) will be located at > > >>>>>> > > >> > > > ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? > > >>>>>> That feels unnecessarily repetitive to me, maybe it makes sense > but > > >> I'm > > >>>>>> missing the context? > > >>>>>> > > >>>>> > > >>>>> Yes - it means that there is this repetitiveness but for a good > > >> reason - > > >>>>> those two "amazon/aws" serve different purpose: > > >>>>> > > >>>>> * The first "providers/amazon/aws" is just where the whole provider > > >>>>> "complete project" is stored - it's basically a directory where > "aws > > >>>>> provider" is stored, a convenient folder to locate it in, that > makes > > >> it > > >>>>> separate from other providers > > >>>>> * The second "src/airflow/providers/amazon/aws" - is the python > > >>>>> package where the source files is stored - this is how (inside the > > >>>>> sub-folder) you tell the actual python "import" to look for the > > >> sources. > > >>>>> > > >>>>> .What really matters is that (eventually) > > >>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a > > >> completely > > >>>>> separate python project - a source of a "standalone" provider > python > > >>>>> project. > > >>>>> There is a "pyproject.toml" file at the root of it and if you do > this > > >> (for > > >>>>> example): > > >>>>> > > >>>>> cd providers/amazon/aws/ > > >>>>> uv sync > > >>>>> > > >>>>> And with it you will be able to work on AWS provider exclusively > as a > > >>>>> separate project (this is not yet complete with the move - tests > are > > >> not > > >>>>> entirely possible to run today - but it will be possible as next > step > > >> - I > > >>>>> explained it in > > >>>>> > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > >>>>> > > >>>>> This has a number of benefits, but one of them is that you will be > > >> able to > > >>>>> build provider by just running `build` command of your favourite > > >>>>> PEP-standard compliant frontend: > > >>>>> > > >>>>> cd providers/amazon/aws/ > > >>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` > ).... > > >>>>> > > >>>>> This will create the provider package inside the `dist" folder. I > > >> just > > >>>>> did it in my PR with `uv` in the first "airbyte` project: > > >>>>> > > >>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build > > >>>>> Building source distribution... > > >>>>> Building wheel from source distribution... > > >>>>> Successfully built > dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > > >>>>> Successfully built > > >>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > > >>>>> > > >>>>> That's it. That also allows cases like installing provider packages > > >> using > > >>>>> git URLs - which I used earlier today to test if the incoming PR of > > >>>>> pygments is actually solving the problem we had yesteday > > >>>>> https://github.com/apache/airflow/pull/45416 (basically we just > > >> make our > > >>>>> provider packages "standard" python packages that all the tools > > >> support. > > >>>>> Anyone who would like to install a commit, hash or branch version > of > > >> the > > >>>>> "airbyte" package from main version of Airflow repo will be able to > > >> do: > > >>>>> > > >>>>> pip install "apache-airflow-providers-airbyte @ git+ > > >>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" > > >>>>> > > >>>>> Currently in order to create the package we need to manually > extract > > >> the > > >>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files > > >>>>> (pyproject.toml, README.rst and few others) and only then we build > > >> the > > >>>>> package. All this - copying file structure, creating new files, > > >> running the > > >>>>> build command after and finally deleting the copied files is now - > > >>>>> dynamically and under-the-hood created by "breeze > release-management > > >>>>> prepare-provider-packages" command. With this change, the directory > > >>>>> structure in `git` repo of ours is totally standard and allows us > > (and > > >>>>> anyone else) to build the package directly from it. > > >>>>> > > >>>>> > > >>>>> And what is the plan for system tests? As part of this > > >> reorganization, > > >>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system? > That > > >> seems > > >>>>>> more intuitive to me than their current location in > > >>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py. > > >>>>>> > > >>>>>> > > >>>>> Oh yeah - I missed that in the original structure as the "airbyte" > > >>>>> provider (that I chose as first one) did not contain the "system" > > >> tests - > > >>>>> but one of the two providers after that i was planning to make sure > > >> system > > >>>>> tests are covered. They are supposed to be moved to "tests/system" > of > > >>>>> course - Elad had similar question and I also explained it in > detail > > >> in > > >>>>> > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > >>>>> > > >>>>> > > >>>>> I hope it answers the questions. If not - I am happy to add more > > >>>>> clarifications :) > > >>>>> > > >>>>> > > >>>>>> J. > > >>>>>> > > >>>>> > > >>> > > >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >> For additional commands, e-mail: dev-h...@airflow.apache.org > > >> > > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > >