We are checking how we can use namespaces in back-portable way and we will have POC soon so that we all will be able to see how it will look like.
J. On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <a...@apache.org> wrote: > I'll have to read your proposal in detail (sorry, no time right now!), but > I'm broadly in favour of this approach, and I think keeping them _in_ the > same repo is the best plan -- that makes writing and testing cross-cutting > changes easier. > > -a > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <tomasz.urbas...@polidea.com> > wrote: > > > > I think utilizing namespaces should reduce a lot of problems raised by > > using separate repos (who will manage it? how to release? where should be > > the repo?). > > > > Bests, > > Tomek > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <jarek.pot...@polidea.com> > > wrote: > > > >> Thanks Bas for comments! Let me share my thoughts below. > >> > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < > >> basharens...@godatadriven.com> > >> wrote: > >> > >>> Hi Jarek, I definitely see a future in creating separate installable > >>> packages for various operators/hooks/etc (as in AIP-8). This would IMO > >>> strip the “core” Airflow to only what’s needed and result in a small > >>> package without a ton of dependencies (and make it more maintainable, > >>> shorter tests, etc etc etc). Not exactly sure though what you’re > >> proposing > >>> in your e-mail, is it a new AIP for an intermediate step towards AIP-8? > >>> > >> > >> It's a new AIP I am proposing. For now it's only for backporting the > new > >> 2.0 import paths to 1.10.* series. > >> > >> It's more of "incremental going in direction of AIP-8 and learning some > >> difficulties involved" than implementing AIP-8 fully. We are taking > >> advantage of changes in import paths from AIP-21 which make it possible > to > >> have both old and new (optional) operators available in 1.10.* series of > >> Airflow. I think there is a lot more to do for full implementation of > >> AIP-8: decisions how to maintain, install those operator groups > separately, > >> stewardship model/organisation for the separate groups, how to manage > >> cross-dependencies, procedures for releasing the packages etc. > >> > >> I think about this new AIP also as a learning effort - we would learn > more > >> how separate packaging works and then we can follow up with AIP-8 full > >> implementation for "modular" Airflow. Then AIP-8 could be implemented in > >> Airflow 2.1 for example - or 3.0 if we start following semantic > versioning > >> - based on those learnings. It's a bit of good example of having cake > and > >> eating it too. We can try out modularity in 1.10.* while cutting the > scope > >> of 2.0 and not implementing full management/release procedure for AIP-8 > >> yet. > >> > >> > >>> Thinking about this, I think there are still a few grey areas (which > >> would > >>> be good to discuss in a new AIP, or continue on AIP-8): > >>> > >>> * In your email you only speak only about the 3 big cloud providers > >>> (btw I made a PR for migrating all AWS components -> > >>> https://github.com/apache/airflow/pull/6439). Is there a plan for > >>> splitting other components than Google/AWS/Azure? > >>> > >> > >> We could add more groups as part of this new AIP indeed (as an > extension to > >> AIP-21 and pre-requisite to AIP-8). We already see how > moving/deprecation > >> works for the providers package - it works for GCP/Google rather nicely. > >> But there is nothing to prevent us from extending it to cover other > groups > >> of operators/hooks. If you look at the current structure of > documentation > >> done by Kamil, we can follow the structure there and move the > >> operators/hooks accordingly ( > >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html): > >> > >> Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service > >> integrations, Software integrations, Protocol integrations. > >> > >> I am happy to include that in the AIP - if others agree it's a good > idea. > >> Out of those groups - I think only Fundamentals should not be > back-ported. > >> Others should be rather easy to port (if we decide to). We already have > >> quite a lot of those in the new GCP operators for 2.0. So starting with > >> GCP/Google group is a good idea. Also following with Cloud Providers > first > >> is a good thing. For example we have now support from Google Composer > team > >> to do this separation for GCP (and we learn from it) and then we can > claim > >> the stewardship in our team for releasing the python 3/ Airflow > >> 1.10-compatible "airflow-google" packages. Possibly other Cloud > >> Providers/teams might follow this (if they see the value in it) and > there > >> could be different stewards for those. And then we can do other groups > if > >> we decide to. I think this way we can learn whether AIP-8 is manageable > and > >> what real problems we are going to face. > >> > >> * Each “plugin” e.g. GCP would be a separate repo, should we create > >>> some sort of blueprint for such packages? > >>> > >> > >> I think we do not need separate repos (at all) but in this new AIP we > can > >> test it before we decide to go for AIP-8. IMHO - monorepo approach will > >> work here rather nicely. We could use python-3 native namespaces > >> <https://packaging.python.org/guides/packaging-namespace-packages/> for > >> the > >> sub-packages when we go full AIP-8. For now we could simply package the > new > >> operators in separate pip package for Python 3 version 1.10.* series > only. > >> We only need to test if it works well with another package providing > >> 'airflow.providers.*' after apache-airflow is installed (providing > >> 'airflow' package). But I think we can make it work. I don't think we > >> really need to split the repos, namespaces will work just fine and has > >> easier management of cross-repository dependencies (but we can learn > >> otherwise). For sure we will not need it for the new proposed AIP of > >> backporting groups to 1.10 and we can defer that decision to AIP-8 > >> implementation time. > >> > >> > >>> * In which Airflow version do we start raising deprecation warnings > >>> and in which version would we remove the original? > >>> > >> > >> I think we should do what we did in GCP case already. Those old > "imports" > >> for operators can be made as deprecated in Airflow 2.0 (and removed in > 2.1 > >> or 3.0 if we start following semantic versioning). We can however do it > >> before in 1.10.7 or 1.10.8 if we release those (without removing the old > >> operators yet - just raise deprecation warnings and inform that for > python3 > >> the new "airflow-google", "airflow-aws" etc. packages can be installed > and > >> users can switch to it). > >> > >> J. > >> > >> > >>> > >>> Cheers, > >>> Bas > >>> > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com > <mailto: > >>> jarek.pot...@polidea.com>> wrote: > >>> > >>> Hello - any comments on that? I am happy to make it into an AIP :)? > >>> > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com > >>> <mailto:jarek.pot...@polidea.com>> > >>> wrote: > >>> > >>> *Motivation* > >>> > >>> I think we really should start thinking about making it easier to > migrate > >>> to 2.0 for our users. After implementing some recent changes related to > >>> AIP-21- > >>> Changes in import paths > >>> < > >>> > >> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > >>> > >>> I > >>> think I have an idea that might help with it. > >>> > >>> *Proposal* > >>> > >>> We could package some of the new and improved 2.0 operators (moved to > >>> "providers" package) and let them be used in Python 3 environment of > >>> airflow 1.10.x. > >>> > >>> This can be done case-by-case per "cloud provider". It should not be > >>> obligatory, should be largely driven by each provider. It's not yet > full > >>> AIP-8 > >>> Split Hooks/Operators into separate packages > >>> < > >>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 > >>> . > >>> It's > >>> merely backporting of some operators/hooks to get it work in 1.10. But > by > >>> doing it we might try out the concept of splitting, learn about > >> maintenance > >>> problems and maybe implement full *AIP-8 *approach in 2.1 consistently > >>> across the board. > >>> > >>> *Context* > >>> > >>> Part of the AIP-21 was to move import paths for Cloud providers to > >>> separate providers/<PROVIDER> package. An example for that (the first > >>> provider we already almost migrated) was providers/google package > >> (further > >>> divided into gcp/gsuite etc). > >>> > >>> We've done a massive migration of all the Google-related operators, > >>> created a few missing ones and retrofitted some old operators to follow > >> GCP > >>> best practices and fixing a number of problems - also implementing > >> Python3 > >>> and Pylint compatibility. Some of these operators/hooks are not > backwards > >>> compatible. Those that are compatible are still available via the old > >>> imports with deprecation warning. > >>> > >>> We've added missing tests (including system tests) and missing > features - > >>> improving some of the Google operators - giving the users more > >> capabilities > >>> and fixing some issues. Those operators should pretty much "just work" > in > >>> Airflow 1.10.x (any recent version) for Python 3. We should be able to > >>> release a separate pip-installable package for those operators that > users > >>> should be able to install in Airflow 1.10.x. > >>> > >>> Any user will be able to install this separate package in their Airflow > >>> 1.10.x installation and start using those new "provider" operators in > >>> parallel to the old 1.10.x operators. Other providers ("microsoft", > >>> "amazon") might follow the same approach if they want. We could even at > >>> some point decide to move some of the core operators in similar fashion > >>> (for example following the structure proposed in the latest > >> documentation: > >>> fundamentals / software / etc. > >>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) > >>> > >>> *Pros and cons* > >>> > >>> There are a number of pros: > >>> > >>> - Users will have an easier migration path if they are deeply vested > >>> into 1.10.* version > >>> - It's possible to migrate in stages for people who are also vested in > >>> py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 + > >>> 2.0* > >>> - Moving to new operators in py3 + new operators can be done > >>> gradually. Old operators will continue to work while new can be used > >> more > >>> and more > >>> - People will get incentivised to migrate to python 3 before 2.0 is > >>> out (by using new operators) > >>> - Each provider "package" can have independent release schedule - and > >>> add functionality in already released Airflow versions. > >>> - We do not take out any functionality from the users - we just add > >>> more options > >>> - The releases can be - similarly as main airflow releases - voted > >>> separately by PMC after "stewards" of the package (per provider) > >> perform > >>> round of testing on 1.10.* versions. > >>> - Users will start migrating to new operators earlier and have > >>> smoother switch to 2.0 later > >>> - The latest improved operators will start > >>> > >>> There are three cons I could think of: > >>> > >>> - There will be quite a lot of duplication between old and new > >>> operators (they will co-exist in 1.10). That might lead to confusion > of > >>> users and problems with cooperation between different operators/hooks > >>> - Having new operators in 1.10 python 3 might keep people from > >>> migrating to 2.0 > >>> - It will require some maintenance and separate release overhead. > >>> > >>> I already spoke to Composer team @Google and they are very positive > about > >>> this. I also spoke to Ash and seems it might also be OK for Astronomer > >>> team. We have Google's backing and support, and we can provide > >> maintenance > >>> and support for those packages - being an example for other providers > how > >>> they can do it. > >>> > >>> Let me know what you think - and whether I should make it into an > >> official > >>> AIP maybe? > >>> > >>> J. > >>> > >>> > >>> > >>> -- > >>> > >>> Jarek Potiuk > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer > >>> > >>> M: +48 660 796 129 <+48660796129> > >>> [image: Polidea] <https://www.polidea.com/> > >>> > >>> > >>> > >>> -- > >>> > >>> Jarek Potiuk > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer > >>> > >>> M: +48 660 796 129 <+48660796129> > >>> [image: Polidea] <https://www.polidea.com/> > >>> > >>> > >> > >> -- > >> > >> Jarek Potiuk > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > >> M: +48 660 796 129 <+48660796129> > >> [image: Polidea] <https://www.polidea.com/> > >> > > > > > > -- > > > > Tomasz Urbaszek > > Polidea <https://www.polidea.com/> | Junior Software Engineer > > > > M: +48 505 628 493 <+48505628493> > > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com> > > > > Unique Tech > > Check out our projects! <https://www.polidea.com/our-work> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>