>
> I'm curious Jarek, does Airflow take any dependencies on popular libraries
> like pandas, numpy, pyarrow, scipy, etc... which users are likely to have
> their own dependency on? I think these dependencies are challenging in a
> different way than the client libraries - ideally we would support a wide
> version range so as not to require users to upgrade those libraries in
> lockstep with Beam. However in some cases our dependency is pretty tight
> (e.g. the DataFrame API's dependency on pandas), so we need to make sure to
> explicitly test with multiple different versions. Does Airflow have any
> similar issues?
>

Yes we do (all of those I think :) ). Complete set of all our deps can be
found here
https://github.com/apache/airflow/blob/constraints-main/constraints-3.9.txt
(continuously updated and we have different sets for different python
versions).

We took a rather interesting and unusual approach (more details in my talk)
- mainly because Airflow is both an application to install (for users) and
library to use (for DAG authors) and both have contradicting expectations
(installation stability versus flexibility in upgrading/downgrading
dependencies). Our approach is really smart in making sure water and fire
play well with each other.

Most of those dependencies are coming from optional extras (list of all
extras here:
https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html).
More often than not the "problematic" dependencies you mention are
transitive dependencies through some client libraries we use (for example
Apache Beam SDK is a big contributor to those :).

Airflow "core" itself has far less dependencies
https://github.com/apache/airflow/blob/constraints-main/constraints-no-providers-3.9.txt
(175 currently) and we actively made sure that all "pandas" of this world
are only optional extra deps.

Now - the interesting thing is that we use "constraints'' (the links you
with dependencies that I posted are those constraints) to pin versions of
the dependencies that are "golden" - i.e. we test those continuously in our
CI and we automatically upgrade the constraints when all the unit and
integration tests pass.
There is a little bit of complexity and sometimes conflicts to handle (as
`pip` has to find the right set of deps that will work for all our optional
extras), but eventually we have really one "golden" set of constraints at
any moment in time main (or v2-x branch - we have a separate set for each
branch) that we are dealing with. And this is the only "set" of dependency
versions that Airflow gets tested with. Note - these are *constraints
*not *requirements
*- that makes a whole world of difference.

Then when we release airflow, we "freeze" the constraints with the version
tag. We know they work because all our tests pass with them in CI.

Then we communicate to our users (and we use it in our Docker image) that
the only "supported" way of installing airflow is with using `pip` and
constraints
https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html.
And we do not support poetry, pipenv - we leave it up to users to handle
them (until poetry/pipenv will support constraints - which we are waiting
for and there is an issue where I explained  why it is useful). It looks
like that `pip install "apache-airflow==2.3.4" --constraint "
https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.9.txt"`
(different constraints for different airflow version and Python version you
have)

Constraints have this nice feature that they are only used during the "pip
install" phase and thrown out immediately after the install is complete.
They do not create "hard" requirements for airflow. Airflow still has a
number of "lower-bound" limits for a number of constraints but we try to
avoid putting upper-bounds at all (only in specific cases and documenting
them) and our bounds are rather relaxed. This way we achieve two things:

1) when someone does not use constraints and has a problem with broken
dependency - we tell them to use constraints - this is what we as a
community commit to and support
2) but by using constraints mechanism we do not limit our users if they
want to upgrade or downgrade any dependencies. They are free to do it (as
long as it fits the - rather relaxed lower/upper bounds of Airflow). But
"with great powers come great responsibilities" - if they want to do that.,
THEY have to make sure that airflow will work. We make no guarantees there.
3) we are not limited by the 3rd-party libraries that come as extras - if
you do not use those, the limits do not apply

I think this works really well - but it is rather complex to setup and
maintain - I built a whole complex set of scripts and I have the whole
`breeze` ("It's a breeze to develop airflow" is the theme) development/CI
environment based on docker and docker-compose that allows us to automate
all of that.

J.

Reply via email to