I'll have to read your proposal in detail (sorry, no time right now!), but I'm broadly in favour of this approach, and I think keeping them _in_ the same repo is the best plan -- that makes writing and testing cross-cutting changes easier.
-a > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <tomasz.urbas...@polidea.com> wrote: > > I think utilizing namespaces should reduce a lot of problems raised by > using separate repos (who will manage it? how to release? where should be > the repo?). > > Bests, > Tomek > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >> Thanks Bas for comments! Let me share my thoughts below. >> >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < >> basharens...@godatadriven.com> >> wrote: >> >>> Hi Jarek, I definitely see a future in creating separate installable >>> packages for various operators/hooks/etc (as in AIP-8). This would IMO >>> strip the “core” Airflow to only what’s needed and result in a small >>> package without a ton of dependencies (and make it more maintainable, >>> shorter tests, etc etc etc). Not exactly sure though what you’re >> proposing >>> in your e-mail, is it a new AIP for an intermediate step towards AIP-8? >>> >> >> It's a new AIP I am proposing. For now it's only for backporting the new >> 2.0 import paths to 1.10.* series. >> >> It's more of "incremental going in direction of AIP-8 and learning some >> difficulties involved" than implementing AIP-8 fully. We are taking >> advantage of changes in import paths from AIP-21 which make it possible to >> have both old and new (optional) operators available in 1.10.* series of >> Airflow. I think there is a lot more to do for full implementation of >> AIP-8: decisions how to maintain, install those operator groups separately, >> stewardship model/organisation for the separate groups, how to manage >> cross-dependencies, procedures for releasing the packages etc. >> >> I think about this new AIP also as a learning effort - we would learn more >> how separate packaging works and then we can follow up with AIP-8 full >> implementation for "modular" Airflow. Then AIP-8 could be implemented in >> Airflow 2.1 for example - or 3.0 if we start following semantic versioning >> - based on those learnings. It's a bit of good example of having cake and >> eating it too. We can try out modularity in 1.10.* while cutting the scope >> of 2.0 and not implementing full management/release procedure for AIP-8 >> yet. >> >> >>> Thinking about this, I think there are still a few grey areas (which >> would >>> be good to discuss in a new AIP, or continue on AIP-8): >>> >>> * In your email you only speak only about the 3 big cloud providers >>> (btw I made a PR for migrating all AWS components -> >>> https://github.com/apache/airflow/pull/6439). Is there a plan for >>> splitting other components than Google/AWS/Azure? >>> >> >> We could add more groups as part of this new AIP indeed (as an extension to >> AIP-21 and pre-requisite to AIP-8). We already see how moving/deprecation >> works for the providers package - it works for GCP/Google rather nicely. >> But there is nothing to prevent us from extending it to cover other groups >> of operators/hooks. If you look at the current structure of documentation >> done by Kamil, we can follow the structure there and move the >> operators/hooks accordingly ( >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html): >> >> Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service >> integrations, Software integrations, Protocol integrations. >> >> I am happy to include that in the AIP - if others agree it's a good idea. >> Out of those groups - I think only Fundamentals should not be back-ported. >> Others should be rather easy to port (if we decide to). We already have >> quite a lot of those in the new GCP operators for 2.0. So starting with >> GCP/Google group is a good idea. Also following with Cloud Providers first >> is a good thing. For example we have now support from Google Composer team >> to do this separation for GCP (and we learn from it) and then we can claim >> the stewardship in our team for releasing the python 3/ Airflow >> 1.10-compatible "airflow-google" packages. Possibly other Cloud >> Providers/teams might follow this (if they see the value in it) and there >> could be different stewards for those. And then we can do other groups if >> we decide to. I think this way we can learn whether AIP-8 is manageable and >> what real problems we are going to face. >> >> * Each “plugin” e.g. GCP would be a separate repo, should we create >>> some sort of blueprint for such packages? >>> >> >> I think we do not need separate repos (at all) but in this new AIP we can >> test it before we decide to go for AIP-8. IMHO - monorepo approach will >> work here rather nicely. We could use python-3 native namespaces >> <https://packaging.python.org/guides/packaging-namespace-packages/> for >> the >> sub-packages when we go full AIP-8. For now we could simply package the new >> operators in separate pip package for Python 3 version 1.10.* series only. >> We only need to test if it works well with another package providing >> 'airflow.providers.*' after apache-airflow is installed (providing >> 'airflow' package). But I think we can make it work. I don't think we >> really need to split the repos, namespaces will work just fine and has >> easier management of cross-repository dependencies (but we can learn >> otherwise). For sure we will not need it for the new proposed AIP of >> backporting groups to 1.10 and we can defer that decision to AIP-8 >> implementation time. >> >> >>> * In which Airflow version do we start raising deprecation warnings >>> and in which version would we remove the original? >>> >> >> I think we should do what we did in GCP case already. Those old "imports" >> for operators can be made as deprecated in Airflow 2.0 (and removed in 2.1 >> or 3.0 if we start following semantic versioning). We can however do it >> before in 1.10.7 or 1.10.8 if we release those (without removing the old >> operators yet - just raise deprecation warnings and inform that for python3 >> the new "airflow-google", "airflow-aws" etc. packages can be installed and >> users can switch to it). >> >> J. >> >> >>> >>> Cheers, >>> Bas >>> >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com<mailto: >>> jarek.pot...@polidea.com>> wrote: >>> >>> Hello - any comments on that? I am happy to make it into an AIP :)? >>> >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com >>> <mailto:jarek.pot...@polidea.com>> >>> wrote: >>> >>> *Motivation* >>> >>> I think we really should start thinking about making it easier to migrate >>> to 2.0 for our users. After implementing some recent changes related to >>> AIP-21- >>> Changes in import paths >>> < >>> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths >>> >>> I >>> think I have an idea that might help with it. >>> >>> *Proposal* >>> >>> We could package some of the new and improved 2.0 operators (moved to >>> "providers" package) and let them be used in Python 3 environment of >>> airflow 1.10.x. >>> >>> This can be done case-by-case per "cloud provider". It should not be >>> obligatory, should be largely driven by each provider. It's not yet full >>> AIP-8 >>> Split Hooks/Operators into separate packages >>> < >>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 >>> . >>> It's >>> merely backporting of some operators/hooks to get it work in 1.10. But by >>> doing it we might try out the concept of splitting, learn about >> maintenance >>> problems and maybe implement full *AIP-8 *approach in 2.1 consistently >>> across the board. >>> >>> *Context* >>> >>> Part of the AIP-21 was to move import paths for Cloud providers to >>> separate providers/<PROVIDER> package. An example for that (the first >>> provider we already almost migrated) was providers/google package >> (further >>> divided into gcp/gsuite etc). >>> >>> We've done a massive migration of all the Google-related operators, >>> created a few missing ones and retrofitted some old operators to follow >> GCP >>> best practices and fixing a number of problems - also implementing >> Python3 >>> and Pylint compatibility. Some of these operators/hooks are not backwards >>> compatible. Those that are compatible are still available via the old >>> imports with deprecation warning. >>> >>> We've added missing tests (including system tests) and missing features - >>> improving some of the Google operators - giving the users more >> capabilities >>> and fixing some issues. Those operators should pretty much "just work" in >>> Airflow 1.10.x (any recent version) for Python 3. We should be able to >>> release a separate pip-installable package for those operators that users >>> should be able to install in Airflow 1.10.x. >>> >>> Any user will be able to install this separate package in their Airflow >>> 1.10.x installation and start using those new "provider" operators in >>> parallel to the old 1.10.x operators. Other providers ("microsoft", >>> "amazon") might follow the same approach if they want. We could even at >>> some point decide to move some of the core operators in similar fashion >>> (for example following the structure proposed in the latest >> documentation: >>> fundamentals / software / etc. >>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) >>> >>> *Pros and cons* >>> >>> There are a number of pros: >>> >>> - Users will have an easier migration path if they are deeply vested >>> into 1.10.* version >>> - It's possible to migrate in stages for people who are also vested in >>> py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 + >>> 2.0* >>> - Moving to new operators in py3 + new operators can be done >>> gradually. Old operators will continue to work while new can be used >> more >>> and more >>> - People will get incentivised to migrate to python 3 before 2.0 is >>> out (by using new operators) >>> - Each provider "package" can have independent release schedule - and >>> add functionality in already released Airflow versions. >>> - We do not take out any functionality from the users - we just add >>> more options >>> - The releases can be - similarly as main airflow releases - voted >>> separately by PMC after "stewards" of the package (per provider) >> perform >>> round of testing on 1.10.* versions. >>> - Users will start migrating to new operators earlier and have >>> smoother switch to 2.0 later >>> - The latest improved operators will start >>> >>> There are three cons I could think of: >>> >>> - There will be quite a lot of duplication between old and new >>> operators (they will co-exist in 1.10). That might lead to confusion of >>> users and problems with cooperation between different operators/hooks >>> - Having new operators in 1.10 python 3 might keep people from >>> migrating to 2.0 >>> - It will require some maintenance and separate release overhead. >>> >>> I already spoke to Composer team @Google and they are very positive about >>> this. I also spoke to Ash and seems it might also be OK for Astronomer >>> team. We have Google's backing and support, and we can provide >> maintenance >>> and support for those packages - being an example for other providers how >>> they can do it. >>> >>> Let me know what you think - and whether I should make it into an >> official >>> AIP maybe? >>> >>> J. >>> >>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> >>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> > > > -- > > Tomasz Urbaszek > Polidea <https://www.polidea.com/> | Junior Software Engineer > > M: +48 505 628 493 <+48505628493> > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com> > > Unique Tech > Check out our projects! <https://www.polidea.com/our-work>