Hi Jarek, I definitely see a future in creating separate installable packages 
for various operators/hooks/etc (as in AIP-8). This would IMO strip the “core” 
Airflow to only what’s needed and result in a small package without a ton of 
dependencies (and make it more maintainable, shorter tests, etc etc etc). Not 
exactly sure though what you’re proposing in your e-mail, is it a new AIP for 
an intermediate step towards AIP-8?

Thinking about this, I think there are still a few grey areas (which would be 
good to discuss in a new AIP, or continue on AIP-8):

  *   In your email you only speak only about the 3 big cloud providers (btw I 
made a PR for migrating all AWS components -> 
https://github.com/apache/airflow/pull/6439). Is there a plan for splitting 
other components than Google/AWS/Azure?
  *   Each “plugin” e.g. GCP would be a separate repo, should we create some 
sort of blueprint for such packages?
  *   In which Airflow version do we start raising deprecation warnings and in 
which version would we remove the original?

Cheers,
Bas

On 27 Oct 2019, at 08:33, Jarek Potiuk 
<jarek.pot...@polidea.com<mailto:jarek.pot...@polidea.com>> wrote:

Hello - any comments on that? I am happy to make it into an AIP :)?

On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk 
<jarek.pot...@polidea.com<mailto:jarek.pot...@polidea.com>>
wrote:

*Motivation*

I think we really should start thinking about making it easier to migrate
to 2.0 for our users. After implementing some recent changes related to AIP-21-
Changes in import paths
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths>
 I
think I have an idea that might help with it.

*Proposal*

We could package some of the new and improved 2.0 operators (moved to
"providers" package) and let them be used in Python 3 environment of
airflow 1.10.x.

This can be done case-by-case per "cloud provider". It should not be
obligatory, should be largely driven by each provider. It's not yet full AIP-8
Split Hooks/Operators into separate packages
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303>. 
It's
merely backporting of some operators/hooks to get it work in 1.10. But by
doing it we might try out the concept of splitting, learn about maintenance
problems and maybe implement full *AIP-8 *approach in 2.1 consistently
across the board.

*Context*

Part of the AIP-21 was to move import paths for Cloud providers to
separate providers/<PROVIDER> package. An example for that (the first
provider we already almost migrated) was providers/google package (further
divided into gcp/gsuite etc).

We've done a massive migration of all the Google-related operators,
created a few missing ones and retrofitted some old operators to follow GCP
best practices and fixing a number of problems - also implementing Python3
and Pylint compatibility. Some of these operators/hooks are not backwards
compatible. Those that are compatible are still available via the old
imports with deprecation warning.

We've added missing tests (including system tests) and missing features -
improving some of the Google operators - giving the users more capabilities
and fixing some issues. Those operators should pretty much "just work" in
Airflow 1.10.x (any recent version) for Python 3. We should be able to
release a separate pip-installable package for those operators that users
should be able to install in Airflow 1.10.x.

Any user will be able to install this separate package in their Airflow
1.10.x installation and start using those new "provider" operators in
parallel to the old 1.10.x operators. Other providers ("microsoft",
"amazon") might follow the same approach if they want. We could even at
some point decide to move some of the core operators in similar fashion
(for example following the structure proposed in the latest documentation:
fundamentals / software / etc.
https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)

*Pros and cons*

There are a number of pros:

  - Users will have an easier migration path if they are deeply vested
  into 1.10.* version
  - It's possible to migrate in stages for people who are also vested in
  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 +
  2.0*
  - Moving to new operators in py3 + new operators can be done
  gradually. Old operators will continue to work while new can be used more
  and more
  - People will get incentivised to migrate to python 3 before 2.0 is
  out (by using new operators)
  - Each provider "package" can have independent release schedule - and
  add functionality in already released Airflow versions.
  - We do not take out any functionality from the users - we just add
  more options
  - The releases can be - similarly as main airflow releases - voted
  separately by PMC after "stewards" of the package (per provider) perform
  round of testing on 1.10.* versions.
  - Users will start migrating to new operators earlier and have
  smoother switch to 2.0 later
  - The latest improved operators will start

There are three cons I could think of:

  - There will be quite a lot of duplication between old and new
  operators (they will co-exist in 1.10). That might lead to confusion of
  users and problems with cooperation between different operators/hooks
  - Having new operators in 1.10 python 3 might keep people from
  migrating to 2.0
  - It will require some maintenance and separate release overhead.

I already spoke to Composer team @Google and they are very positive about
this. I also spoke to Ash and seems it might also be OK for Astronomer
team. We have Google's backing and support, and we can provide maintenance
and support for those packages - being an example for other providers how
they can do it.

Let me know what you think - and whether I should make it into an official
AIP maybe?

J.



--

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>



--

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to