We are checking how we can use namespaces in back-portable way and we will
have POC soon so that we all will be able to see how it will look like.

J.

On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> I'll have to read your proposal in detail (sorry, no time right now!), but
> I'm broadly in favour of this approach, and I think keeping them _in_ the
> same repo is the best plan -- that makes writing and  testing cross-cutting
> changes  easier.
>
> -a
>
> > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <tomasz.urbas...@polidea.com>
> wrote:
> >
> > I think utilizing namespaces should reduce a lot of problems raised by
> > using separate repos (who will manage it? how to release? where should be
> > the repo?).
> >
> > Bests,
> > Tomek
> >
> > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> >> Thanks Bas for comments! Let me share my thoughts below.
> >>
> >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> >> basharens...@godatadriven.com>
> >> wrote:
> >>
> >>> Hi Jarek, I definitely see a future in creating separate installable
> >>> packages for various operators/hooks/etc (as in AIP-8). This would IMO
> >>> strip the “core” Airflow to only what’s needed and result in a small
> >>> package without a ton of dependencies (and make it more maintainable,
> >>> shorter tests, etc etc etc). Not exactly sure though what you’re
> >> proposing
> >>> in your e-mail, is it a new AIP for an intermediate step towards AIP-8?
> >>>
> >>
> >> It's a new AIP I am proposing.  For now it's only for backporting the
> new
> >> 2.0 import paths to 1.10.* series.
> >>
> >> It's more of "incremental going in direction of AIP-8 and learning some
> >> difficulties involved" than implementing AIP-8 fully. We are taking
> >> advantage of changes in import paths from AIP-21 which make it possible
> to
> >> have both old and new (optional) operators available in 1.10.* series of
> >> Airflow. I think there is a lot more to do for full implementation of
> >> AIP-8: decisions how to maintain, install those operator groups
> separately,
> >> stewardship model/organisation for the separate groups, how to manage
> >> cross-dependencies, procedures for releasing the packages etc.
> >>
> >> I think about this new AIP also as a learning effort - we would learn
> more
> >> how separate packaging works and then we can follow up with AIP-8 full
> >> implementation for "modular" Airflow. Then AIP-8 could be implemented in
> >> Airflow 2.1 for example - or 3.0 if we start following semantic
> versioning
> >> - based on those learnings. It's a bit of good example of having cake
> and
> >> eating it too. We can try out modularity in 1.10.* while cutting the
> scope
> >> of 2.0 and not implementing full management/release procedure for AIP-8
> >> yet.
> >>
> >>
> >>> Thinking about this, I think there are still a few grey areas (which
> >> would
> >>> be good to discuss in a new AIP, or continue on AIP-8):
> >>>
> >>>  *   In your email you only speak only about the 3 big cloud providers
> >>> (btw I made a PR for migrating all AWS components ->
> >>> https://github.com/apache/airflow/pull/6439). Is there a plan for
> >>> splitting other components than Google/AWS/Azure?
> >>>
> >>
> >> We could add more groups as part of this new AIP indeed (as an
> extension to
> >> AIP-21 and pre-requisite to AIP-8). We already see how
> moving/deprecation
> >> works for the providers package - it works for GCP/Google rather nicely.
> >> But there is nothing to prevent us from extending it to cover other
> groups
> >> of operators/hooks. If you look at the current structure of
> documentation
> >> done by Kamil, we can follow the structure there and move the
> >> operators/hooks accordingly (
> >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html):
> >>
> >>      Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft
> >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service
> >> integrations, Software integrations, Protocol integrations.
> >>
> >> I am happy to include that in the AIP - if others agree it's a good
> idea.
> >> Out of those groups -  I think only Fundamentals should not be
> back-ported.
> >> Others should be rather easy to port (if we decide to). We already have
> >> quite a lot of those in the new GCP operators for 2.0. So starting with
> >> GCP/Google group is a good idea. Also following with Cloud Providers
> first
> >> is a good thing. For example we have now support from Google Composer
> team
> >> to do this separation for GCP (and we learn from it) and then we can
> claim
> >> the stewardship in our team for releasing the python 3/ Airflow
> >> 1.10-compatible "airflow-google" packages. Possibly other Cloud
> >> Providers/teams might follow this (if they see the value in it) and
> there
> >> could be different stewards for those. And then we can do other groups
> if
> >> we decide to. I think this way we can learn whether AIP-8 is manageable
> and
> >> what real problems we are going to face.
> >>
> >>  *   Each “plugin” e.g. GCP would be a separate repo, should we create
> >>> some sort of blueprint for such packages?
> >>>
> >>
> >> I think we do not need separate repos (at all) but in this new AIP we
> can
> >> test it before we decide to go for AIP-8. IMHO - monorepo approach will
> >> work here rather nicely. We could use python-3 native namespaces
> >> <https://packaging.python.org/guides/packaging-namespace-packages/> for
> >> the
> >> sub-packages when we go full AIP-8. For now we could simply package the
> new
> >> operators in separate pip package for Python 3 version 1.10.* series
> only.
> >> We only need to test if it works well with another package providing
> >> 'airflow.providers.*' after apache-airflow is installed (providing
> >> 'airflow' package). But I think we can make it work. I don't think we
> >> really need to split the repos, namespaces will work just fine and has
> >> easier management of cross-repository dependencies (but we can learn
> >> otherwise). For sure we will not need it for the new proposed AIP of
> >> backporting groups to 1.10 and we can defer that decision to AIP-8
> >> implementation time.
> >>
> >>
> >>>  *   In which Airflow version do we start raising deprecation warnings
> >>> and in which version would we remove the original?
> >>>
> >>
> >> I think we should do what we did in GCP case already. Those old
> "imports"
> >> for operators can be made as deprecated in Airflow 2.0 (and removed in
> 2.1
> >> or 3.0 if we start following semantic versioning). We can however do it
> >> before in 1.10.7 or 1.10.8 if we release those (without removing the old
> >> operators yet - just raise deprecation warnings and inform that for
> python3
> >> the new "airflow-google", "airflow-aws" etc. packages can be installed
> and
> >> users can switch to it).
> >>
> >> J.
> >>
> >>
> >>>
> >>> Cheers,
> >>> Bas
> >>>
> >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com
> <mailto:
> >>> jarek.pot...@polidea.com>> wrote:
> >>>
> >>> Hello - any comments on that? I am happy to make it into an AIP :)?
> >>>
> >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com
> >>> <mailto:jarek.pot...@polidea.com>>
> >>> wrote:
> >>>
> >>> *Motivation*
> >>>
> >>> I think we really should start thinking about making it easier to
> migrate
> >>> to 2.0 for our users. After implementing some recent changes related to
> >>> AIP-21-
> >>> Changes in import paths
> >>> <
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >>>
> >>> I
> >>> think I have an idea that might help with it.
> >>>
> >>> *Proposal*
> >>>
> >>> We could package some of the new and improved 2.0 operators (moved to
> >>> "providers" package) and let them be used in Python 3 environment of
> >>> airflow 1.10.x.
> >>>
> >>> This can be done case-by-case per "cloud provider". It should not be
> >>> obligatory, should be largely driven by each provider. It's not yet
> full
> >>> AIP-8
> >>> Split Hooks/Operators into separate packages
> >>> <
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >>> .
> >>> It's
> >>> merely backporting of some operators/hooks to get it work in 1.10. But
> by
> >>> doing it we might try out the concept of splitting, learn about
> >> maintenance
> >>> problems and maybe implement full *AIP-8 *approach in 2.1 consistently
> >>> across the board.
> >>>
> >>> *Context*
> >>>
> >>> Part of the AIP-21 was to move import paths for Cloud providers to
> >>> separate providers/<PROVIDER> package. An example for that (the first
> >>> provider we already almost migrated) was providers/google package
> >> (further
> >>> divided into gcp/gsuite etc).
> >>>
> >>> We've done a massive migration of all the Google-related operators,
> >>> created a few missing ones and retrofitted some old operators to follow
> >> GCP
> >>> best practices and fixing a number of problems - also implementing
> >> Python3
> >>> and Pylint compatibility. Some of these operators/hooks are not
> backwards
> >>> compatible. Those that are compatible are still available via the old
> >>> imports with deprecation warning.
> >>>
> >>> We've added missing tests (including system tests) and missing
> features -
> >>> improving some of the Google operators - giving the users more
> >> capabilities
> >>> and fixing some issues. Those operators should pretty much "just work"
> in
> >>> Airflow 1.10.x (any recent version) for Python 3. We should be able to
> >>> release a separate pip-installable package for those operators that
> users
> >>> should be able to install in Airflow 1.10.x.
> >>>
> >>> Any user will be able to install this separate package in their Airflow
> >>> 1.10.x installation and start using those new "provider" operators in
> >>> parallel to the old 1.10.x operators. Other providers ("microsoft",
> >>> "amazon") might follow the same approach if they want. We could even at
> >>> some point decide to move some of the core operators in similar fashion
> >>> (for example following the structure proposed in the latest
> >> documentation:
> >>> fundamentals / software / etc.
> >>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> >>>
> >>> *Pros and cons*
> >>>
> >>> There are a number of pros:
> >>>
> >>>  - Users will have an easier migration path if they are deeply vested
> >>>  into 1.10.* version
> >>>  - It's possible to migrate in stages for people who are also vested in
> >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 +
> >>>  2.0*
> >>>  - Moving to new operators in py3 + new operators can be done
> >>>  gradually. Old operators will continue to work while new can be used
> >> more
> >>>  and more
> >>>  - People will get incentivised to migrate to python 3 before 2.0 is
> >>>  out (by using new operators)
> >>>  - Each provider "package" can have independent release schedule - and
> >>>  add functionality in already released Airflow versions.
> >>>  - We do not take out any functionality from the users - we just add
> >>>  more options
> >>>  - The releases can be - similarly as main airflow releases - voted
> >>>  separately by PMC after "stewards" of the package (per provider)
> >> perform
> >>>  round of testing on 1.10.* versions.
> >>>  - Users will start migrating to new operators earlier and have
> >>>  smoother switch to 2.0 later
> >>>  - The latest improved operators will start
> >>>
> >>> There are three cons I could think of:
> >>>
> >>>  - There will be quite a lot of duplication between old and new
> >>>  operators (they will co-exist in 1.10). That might lead to confusion
> of
> >>>  users and problems with cooperation between different operators/hooks
> >>>  - Having new operators in 1.10 python 3 might keep people from
> >>>  migrating to 2.0
> >>>  - It will require some maintenance and separate release overhead.
> >>>
> >>> I already spoke to Composer team @Google and they are very positive
> about
> >>> this. I also spoke to Ash and seems it might also be OK for Astronomer
> >>> team. We have Google's backing and support, and we can provide
> >> maintenance
> >>> and support for those packages - being an example for other providers
> how
> >>> they can do it.
> >>>
> >>> Let me know what you think - and whether I should make it into an
> >> official
> >>> AIP maybe?
> >>>
> >>> J.
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> [image: Polidea] <https://www.polidea.com/>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> [image: Polidea] <https://www.polidea.com/>
> >>>
> >>>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >
> >
> > --
> >
> > Tomasz Urbaszek
> > Polidea <https://www.polidea.com/> | Junior Software Engineer
> >
> > M: +48 505 628 493 <+48505628493>
> > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com>
> >
> > Unique Tech
> > Check out our projects! <https://www.polidea.com/our-work>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to