Hello everyone,

TL;DR; I've been quite busy recently as I was working on backported
"providers" packages for Airflow 1.10.* and I have some pretty good news on
that front. I would love to have your comments and opinions on the current
state of it.  This is more 'information" on what is being implemented now -
I will send a separate thread about some future decisions needed - mostly
from PMC side.

I have two PRs that are relevant and I wanted to describe both here:

1) Preparing backportable packages for Airflow 1.10.*
https://github.com/apache/airflow/pull/7391

This PR modifies setyp.py to enable preparation of backportable packages
for Airlfow 1.10.*. Using this version  of setup.py we can prepare and
release PIP packages of "providers" package that will be installable for
Airflow 1.10.*  series. I managed to have it working without converting
packages to implicit namespaces (separate discussion on the devlist).

I did it in a way that we can either prepare "apache-airflow-providers"
package (with all "providers" code in a single package) or we can have
"apache-airflow-providers-XXXXX" packages - separately for each providers
package we have. The latter approach produces many more smaller (and
potentially inter-dependent) packages - something that in the future might
be base for AIP-8
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303&focusedCommentId=103093048>
-
but we do not need it for now. It also nicely keeps dependencies separately
- so each of the packages has only minimum set of dependencies needed for
each package.

I would like to leave it for now, but for the purpose of backporting I
think releasing single "providers" package makes much more sense. But if
others think that we should release many more smaller "providers" packages
separately - I am also quite OK with it. It's just the matter of
testing/status of each package and some inter-dependencies (some packages
depend on each other) - especially for transfer operators.

2) System testing of backportable packages:
https://github.com/apache/airflow/pull/7389

We need to have a way to test that the backported packages are working for
Airflow 1.10. We cannot run all unit tests for Airflow 1.10, but we can run
some system end-2-end tests. While we do not have consistent system
"end-2-end" tests for all operators we have quite extensive set of system
tests for GCP operators. Those tests run example dags from google cloud
platform operators - the example dags are used to both - provide examples
in the docs but also can be run (with appropriate environment) to run the
example dag automatically with a real external system (GCP in this case). I
proposed this approach a long time ago in AIP-4
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems>
and
while it has not been "universally" accepted yet, we followed it with GCP
operator implementations (and we have all GCP operators automatically
testable with system tests), With this PR I made the system test approach
nicely integrated with Pytest markers, Breeze and our test environment - so
it is now very easy to run system tests semi-automatically (and in the
future we can fully automate it when we switch to GitHub actions).

We are planning to run all the system tests for all GCP operators, but when
it's there it's also rather easy to add tests for other groups of operators
so I am planning to have a community-driven effort to add more of those
system-tests (and make sure that backported packages can be safely used in
1.10.* environment).


J.





-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to