The namespace feature looks promising and from your tests, it looks like it
would work well from Airflow 2.0 and onwards.

I will look at it in-depth and see if I have more suggestions or opinion on
it

On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> TL;DR; We did some testing about namespaces and packaging (and potential
> backporting options for 1.10.* python3 Airflows) and we think it's best to
> use namespaces quickly and use different package name
> "airflow-integrations" for all non-fundamental integrations.
>
> Unless we missed some tricks, we cannot use airflow.* sub-packages for the
> 1.10.* backportable packages. Example:
>
>    - "*apache-airflow"* package provides: "airflow.*" (this is what we have
>    today)
>    - "*apache-airflow-providers-google*": provides
>    "airflow.providers.google.*" packages
>
> If we install both packages (old apache-airflow 1.10.6  and new
> apache-airflow-providers-google from 2.0) - it seems that
> the "airflow.providers.google.*" package cannot be imported. This is a bit
> of a problem if we would like to backport the operators from Airflow 2.0 to
> Airflow 1.10 in a way that will be forward-compatible We really want users
> who started using backported operators in 1.10.* do not have to change
> imports in their DAGs to run them in Airflow 2.0.
>
> We discussed it internally in our team and considered several options, but
> we think the best way will be to go straight to "namespaces" in Airflow 2.0
> and to have the integrations (as discussed in AIP-21 discussion) to be in a
> separate "*airflow_integrations*" package.  It might be even more towards
> the AIP-8 implementation and plays together very well in terms of
> "stewardship" discussed in AIP-21 now. But we will still keep (for now)
> single release process for all packages for 2.0 (except for the backporting
> which can be done per-provider before 2.0 release) and provide a foundation
> for future more complex release cycles in future versions.
>
> Herre is the way how the new Airflow 2.0 repository could look like (i only
> show subset of dirs but they are representative). For those whose email
> fixed/colorfont will get corrupted here is an image of this structure
> https://pasteboard.co/IEesTih.png:
>
> |-- airflow
> |   |- __init__.py|   |- operators -> fundamental operators are here
> |-- tests -> tests for core airflow are here (optionally we can move
> them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> package|-- airflow_integrations
> |   |-providers
> |   | |-google
> |   |   |-setup.py -> setup.py for the
> "apache-airflow-integrations-providers-google" package
> |   |   |-airflow_integrations
> |   |     |-__init__.py
> |   |     |-providers
> |   |       |-__init__.py
> |   |       |-google
> |   |         |-__init__.py
> |   |         | tests -> tests for the
> "apache-airflow-integrations-providers-google" package|   |
> |-__init__.py|   |-protocols
> |     |-setup.py -> setup.py for the
> "apache-airflow-integrations-protocols" package
> |     |-airflow_integrations
> |        |-protocols
> |          |-__init__.py|          |-tests -> tests for the
> "apache-airflow-integrations-protocols" package
>
> There are a number of pros for this solution:
>
>    - We could use the standard namespaces feature of python to build
>    multiple packages:
>    https://packaging.python.org/guides/packaging-namespace-packages/
>    - Installation for users will be the same as previously. We could
>    install the needed packages automatically when particular extras are
> used
>    (pip install apache-airflow[google] could install both "apache-airflow"
> and
>    "apache-airflow-integrations-providers-google")
>    - We could have custom setup.py installation process for developers that
>    could install all the packages in development ("-e ." mode) in a single
>    operation.
>    - In case of transfer packages we could have nice error messages
>    informing that the other package needs to be installed (for example
> S3->GCS
>    operator would import "airflow-integrations.providers.amazon.*" and if
> it
>    fails it could raise ("Please install [amazon] extra to use me.")
>    - We could implement numerous optimisations in the way how we run tests
>    in CI (for example run all the "providers" tests only with sqlite, run
>    tests in parallel etc.)
>    - We could implement it gradually - we do not have to have a "big bang"
>    approach - we can implement it in "provider-by-provider" way and test it
>    with one provider (Google) first to make sure that all the mechanisms
> are
>    working
>    - For now we could have the monorepo approach where all the packages
>    will be developed in concert - for now avoiding the dependency problems
>    (but allowing for back-portability to 1.10).
>    - We will have clear boundaries between packages and ability to test for
>    some unwanted/hidden dependencies between packages.
>    - We could switch to (much better) sphinx-apidoc package to continue
>    building single documentation for all of those (sphinx apidoc has
> support
>    for namespaces).
>
> As we are working on GCP move from contrib to core, we could make all the
> effort to test it and try it before we merge it to master so that it will
> be ready for others (and we could help with most of the moves afterwards).
> It seems complex, but in fact in most cases it will be very simple move
> between the packages and can be done incrementally so there is little risk
> in doing this I think.
>
> J.
>
>
> On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com> wrote:
>
> > Tomasz and Ash got good points about the overhead of having separate
> repos.
> > But while we grow bigger and more mature, I would prefer to have what was
> > described in AIP-8. It shouldn't be extremely hard for us to come up with
> > good strategies to handle the overhead. AIP-8 already talked about how it
> > can benefit us. IMO on a high level, having clearly seperation on core
> vs.
> > hooks/operators would make the project much more scalable and the gains
> > would outweigh the cost we pay.
> >
> > That being said, I'm supportive to this moving towards AIP-8 while
> learning
> > approach, quite a good practise to tackle a big project. Looking forward
> to
> > read the AIP.
> >
> >
> > Cheers,
> > Kevin Y
> >
> > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> > > We are checking how we can use namespaces in back-portable way and we
> > will
> > > have POC soon so that we all will be able to see how it will look like.
> > >
> > > J.
> > >
> > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > >
> > > > I'll have to read your proposal in detail (sorry, no time right
> now!),
> > > but
> > > > I'm broadly in favour of this approach, and I think keeping them _in_
> > the
> > > > same repo is the best plan -- that makes writing and  testing
> > > cross-cutting
> > > > changes  easier.
> > > >
> > > > -a
> > > >
> > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > tomasz.urbas...@polidea.com
> > > >
> > > > wrote:
> > > > >
> > > > > I think utilizing namespaces should reduce a lot of problems raised
> > by
> > > > > using separate repos (who will manage it? how to release? where
> > should
> > > be
> > > > > the repo?).
> > > > >
> > > > > Bests,
> > > > > Tomek
> > > > >
> > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > > jarek.pot...@polidea.com>
> > > > > wrote:
> > > > >
> > > > >> Thanks Bas for comments! Let me share my thoughts below.
> > > > >>
> > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > > > >> basharens...@godatadriven.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Jarek, I definitely see a future in creating separate
> > installable
> > > > >>> packages for various operators/hooks/etc (as in AIP-8). This
> would
> > > IMO
> > > > >>> strip the “core” Airflow to only what’s needed and result in a
> > small
> > > > >>> package without a ton of dependencies (and make it more
> > maintainable,
> > > > >>> shorter tests, etc etc etc). Not exactly sure though what you’re
> > > > >> proposing
> > > > >>> in your e-mail, is it a new AIP for an intermediate step towards
> > > AIP-8?
> > > > >>>
> > > > >>
> > > > >> It's a new AIP I am proposing.  For now it's only for backporting
> > the
> > > > new
> > > > >> 2.0 import paths to 1.10.* series.
> > > > >>
> > > > >> It's more of "incremental going in direction of AIP-8 and learning
> > > some
> > > > >> difficulties involved" than implementing AIP-8 fully. We are
> taking
> > > > >> advantage of changes in import paths from AIP-21 which make it
> > > possible
> > > > to
> > > > >> have both old and new (optional) operators available in 1.10.*
> > series
> > > of
> > > > >> Airflow. I think there is a lot more to do for full implementation
> > of
> > > > >> AIP-8: decisions how to maintain, install those operator groups
> > > > separately,
> > > > >> stewardship model/organisation for the separate groups, how to
> > manage
> > > > >> cross-dependencies, procedures for releasing the packages etc.
> > > > >>
> > > > >> I think about this new AIP also as a learning effort - we would
> > learn
> > > > more
> > > > >> how separate packaging works and then we can follow up with AIP-8
> > full
> > > > >> implementation for "modular" Airflow. Then AIP-8 could be
> > implemented
> > > in
> > > > >> Airflow 2.1 for example - or 3.0 if we start following semantic
> > > > versioning
> > > > >> - based on those learnings. It's a bit of good example of having
> > cake
> > > > and
> > > > >> eating it too. We can try out modularity in 1.10.* while cutting
> the
> > > > scope
> > > > >> of 2.0 and not implementing full management/release procedure for
> > > AIP-8
> > > > >> yet.
> > > > >>
> > > > >>
> > > > >>> Thinking about this, I think there are still a few grey areas
> > (which
> > > > >> would
> > > > >>> be good to discuss in a new AIP, or continue on AIP-8):
> > > > >>>
> > > > >>>  *   In your email you only speak only about the 3 big cloud
> > > providers
> > > > >>> (btw I made a PR for migrating all AWS components ->
> > > > >>> https://github.com/apache/airflow/pull/6439). Is there a plan
> for
> > > > >>> splitting other components than Google/AWS/Azure?
> > > > >>>
> > > > >>
> > > > >> We could add more groups as part of this new AIP indeed (as an
> > > > extension to
> > > > >> AIP-21 and pre-requisite to AIP-8). We already see how
> > > > moving/deprecation
> > > > >> works for the providers package - it works for GCP/Google rather
> > > nicely.
> > > > >> But there is nothing to prevent us from extending it to cover
> other
> > > > groups
> > > > >> of operators/hooks. If you look at the current structure of
> > > > documentation
> > > > >> done by Kamil, we can follow the structure there and move the
> > > > >> operators/hooks accordingly (
> > > > >>
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> > > ):
> > > > >>
> > > > >>      Fundamentals, ASF: Apache Software Foundation, Azure:
> Microsoft
> > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform,
> Service
> > > > >> integrations, Software integrations, Protocol integrations.
> > > > >>
> > > > >> I am happy to include that in the AIP - if others agree it's a
> good
> > > > idea.
> > > > >> Out of those groups -  I think only Fundamentals should not be
> > > > back-ported.
> > > > >> Others should be rather easy to port (if we decide to). We already
> > > have
> > > > >> quite a lot of those in the new GCP operators for 2.0. So starting
> > > with
> > > > >> GCP/Google group is a good idea. Also following with Cloud
> Providers
> > > > first
> > > > >> is a good thing. For example we have now support from Google
> > Composer
> > > > team
> > > > >> to do this separation for GCP (and we learn from it) and then we
> can
> > > > claim
> > > > >> the stewardship in our team for releasing the python 3/ Airflow
> > > > >> 1.10-compatible "airflow-google" packages. Possibly other Cloud
> > > > >> Providers/teams might follow this (if they see the value in it)
> and
> > > > there
> > > > >> could be different stewards for those. And then we can do other
> > groups
> > > > if
> > > > >> we decide to. I think this way we can learn whether AIP-8 is
> > > manageable
> > > > and
> > > > >> what real problems we are going to face.
> > > > >>
> > > > >>  *   Each “plugin” e.g. GCP would be a separate repo, should we
> > create
> > > > >>> some sort of blueprint for such packages?
> > > > >>>
> > > > >>
> > > > >> I think we do not need separate repos (at all) but in this new AIP
> > we
> > > > can
> > > > >> test it before we decide to go for AIP-8. IMHO - monorepo approach
> > > will
> > > > >> work here rather nicely. We could use python-3 native namespaces
> > > > >> <
> https://packaging.python.org/guides/packaging-namespace-packages/>
> > > for
> > > > >> the
> > > > >> sub-packages when we go full AIP-8. For now we could simply
> package
> > > the
> > > > new
> > > > >> operators in separate pip package for Python 3 version 1.10.*
> series
> > > > only.
> > > > >> We only need to test if it works well with another package
> providing
> > > > >> 'airflow.providers.*' after apache-airflow is installed (providing
> > > > >> 'airflow' package). But I think we can make it work. I don't think
> > we
> > > > >> really need to split the repos, namespaces will work just fine and
> > has
> > > > >> easier management of cross-repository dependencies (but we can
> learn
> > > > >> otherwise). For sure we will not need it for the new proposed AIP
> of
> > > > >> backporting groups to 1.10 and we can defer that decision to AIP-8
> > > > >> implementation time.
> > > > >>
> > > > >>
> > > > >>>  *   In which Airflow version do we start raising deprecation
> > > warnings
> > > > >>> and in which version would we remove the original?
> > > > >>>
> > > > >>
> > > > >> I think we should do what we did in GCP case already. Those old
> > > > "imports"
> > > > >> for operators can be made as deprecated in Airflow 2.0 (and
> removed
> > in
> > > > 2.1
> > > > >> or 3.0 if we start following semantic versioning). We can however
> do
> > > it
> > > > >> before in 1.10.7 or 1.10.8 if we release those (without removing
> the
> > > old
> > > > >> operators yet - just raise deprecation warnings and inform that
> for
> > > > python3
> > > > >> the new "airflow-google", "airflow-aws" etc. packages can be
> > installed
> > > > and
> > > > >> users can switch to it).
> > > > >>
> > > > >> J.
> > > > >>
> > > > >>
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Bas
> > > > >>>
> > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com
> > > > <mailto:
> > > > >>> jarek.pot...@polidea.com>> wrote:
> > > > >>>
> > > > >>> Hello - any comments on that? I am happy to make it into an AIP
> :)?
> > > > >>>
> > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > > jarek.pot...@polidea.com
> > > > >>> <mailto:jarek.pot...@polidea.com>>
> > > > >>> wrote:
> > > > >>>
> > > > >>> *Motivation*
> > > > >>>
> > > > >>> I think we really should start thinking about making it easier to
> > > > migrate
> > > > >>> to 2.0 for our users. After implementing some recent changes
> > related
> > > to
> > > > >>> AIP-21-
> > > > >>> Changes in import paths
> > > > >>> <
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > >>>
> > > > >>> I
> > > > >>> think I have an idea that might help with it.
> > > > >>>
> > > > >>> *Proposal*
> > > > >>>
> > > > >>> We could package some of the new and improved 2.0 operators
> (moved
> > to
> > > > >>> "providers" package) and let them be used in Python 3 environment
> > of
> > > > >>> airflow 1.10.x.
> > > > >>>
> > > > >>> This can be done case-by-case per "cloud provider". It should not
> > be
> > > > >>> obligatory, should be largely driven by each provider. It's not
> yet
> > > > full
> > > > >>> AIP-8
> > > > >>> Split Hooks/Operators into separate packages
> > > > >>> <
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > > > >>> .
> > > > >>> It's
> > > > >>> merely backporting of some operators/hooks to get it work in
> 1.10.
> > > But
> > > > by
> > > > >>> doing it we might try out the concept of splitting, learn about
> > > > >> maintenance
> > > > >>> problems and maybe implement full *AIP-8 *approach in 2.1
> > > consistently
> > > > >>> across the board.
> > > > >>>
> > > > >>> *Context*
> > > > >>>
> > > > >>> Part of the AIP-21 was to move import paths for Cloud providers
> to
> > > > >>> separate providers/<PROVIDER> package. An example for that (the
> > first
> > > > >>> provider we already almost migrated) was providers/google package
> > > > >> (further
> > > > >>> divided into gcp/gsuite etc).
> > > > >>>
> > > > >>> We've done a massive migration of all the Google-related
> operators,
> > > > >>> created a few missing ones and retrofitted some old operators to
> > > follow
> > > > >> GCP
> > > > >>> best practices and fixing a number of problems - also
> implementing
> > > > >> Python3
> > > > >>> and Pylint compatibility. Some of these operators/hooks are not
> > > > backwards
> > > > >>> compatible. Those that are compatible are still available via the
> > old
> > > > >>> imports with deprecation warning.
> > > > >>>
> > > > >>> We've added missing tests (including system tests) and missing
> > > > features -
> > > > >>> improving some of the Google operators - giving the users more
> > > > >> capabilities
> > > > >>> and fixing some issues. Those operators should pretty much "just
> > > work"
> > > > in
> > > > >>> Airflow 1.10.x (any recent version) for Python 3. We should be
> able
> > > to
> > > > >>> release a separate pip-installable package for those operators
> that
> > > > users
> > > > >>> should be able to install in Airflow 1.10.x.
> > > > >>>
> > > > >>> Any user will be able to install this separate package in their
> > > Airflow
> > > > >>> 1.10.x installation and start using those new "provider"
> operators
> > in
> > > > >>> parallel to the old 1.10.x operators. Other providers
> ("microsoft",
> > > > >>> "amazon") might follow the same approach if they want. We could
> > even
> > > at
> > > > >>> some point decide to move some of the core operators in similar
> > > fashion
> > > > >>> (for example following the structure proposed in the latest
> > > > >> documentation:
> > > > >>> fundamentals / software / etc.
> > > > >>>
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> > > > >>>
> > > > >>> *Pros and cons*
> > > > >>>
> > > > >>> There are a number of pros:
> > > > >>>
> > > > >>>  - Users will have an easier migration path if they are deeply
> > vested
> > > > >>>  into 1.10.* version
> > > > >>>  - It's possible to migrate in stages for people who are also
> > vested
> > > in
> > > > >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) ->
> > py3
> > > +
> > > > >>>  2.0*
> > > > >>>  - Moving to new operators in py3 + new operators can be done
> > > > >>>  gradually. Old operators will continue to work while new can be
> > used
> > > > >> more
> > > > >>>  and more
> > > > >>>  - People will get incentivised to migrate to python 3 before 2.0
> > is
> > > > >>>  out (by using new operators)
> > > > >>>  - Each provider "package" can have independent release schedule
> -
> > > and
> > > > >>>  add functionality in already released Airflow versions.
> > > > >>>  - We do not take out any functionality from the users - we just
> > add
> > > > >>>  more options
> > > > >>>  - The releases can be - similarly as main airflow releases -
> voted
> > > > >>>  separately by PMC after "stewards" of the package (per provider)
> > > > >> perform
> > > > >>>  round of testing on 1.10.* versions.
> > > > >>>  - Users will start migrating to new operators earlier and have
> > > > >>>  smoother switch to 2.0 later
> > > > >>>  - The latest improved operators will start
> > > > >>>
> > > > >>> There are three cons I could think of:
> > > > >>>
> > > > >>>  - There will be quite a lot of duplication between old and new
> > > > >>>  operators (they will co-exist in 1.10). That might lead to
> > confusion
> > > > of
> > > > >>>  users and problems with cooperation between different
> > > operators/hooks
> > > > >>>  - Having new operators in 1.10 python 3 might keep people from
> > > > >>>  migrating to 2.0
> > > > >>>  - It will require some maintenance and separate release
> overhead.
> > > > >>>
> > > > >>> I already spoke to Composer team @Google and they are very
> positive
> > > > about
> > > > >>> this. I also spoke to Ash and seems it might also be OK for
> > > Astronomer
> > > > >>> team. We have Google's backing and support, and we can provide
> > > > >> maintenance
> > > > >>> and support for those packages - being an example for other
> > providers
> > > > how
> > > > >>> they can do it.
> > > > >>>
> > > > >>> Let me know what you think - and whether I should make it into an
> > > > >> official
> > > > >>> AIP maybe?
> > > > >>>
> > > > >>> J.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> Jarek Potiuk
> > > > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >>>
> > > > >>> M: +48 660 796 129 <+48660796129>
> > > > >>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> Jarek Potiuk
> > > > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >>>
> > > > >>> M: +48 660 796 129 <+48660796129>
> > > > >>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>
> > > > >>>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Jarek Potiuk
> > > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >>
> > > > >> M: +48 660 796 129 <+48660796129>
> > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Tomasz Urbaszek
> > > > > Polidea <https://www.polidea.com/> | Junior Software Engineer
> > > > >
> > > > > M: +48 505 628 493 <+48505628493>
> > > > > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com>
> > > > >
> > > > > Unique Tech
> > > > > Check out our projects! <https://www.polidea.com/our-work>
> > > >
> > > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to