potiuk commented on PR #43001: URL: https://github.com/apache/airflow/pull/43001#issuecomment-2412632076
Yeah. that's expected and it should fix itself when we merge to main (you will be able to see it in `canary` builds): This piece of output is: ``` #48 4.743 Installing airflow from main. It is used to cache dependencies #48 4.743 #48 4.745 + curl -fsSL https://github.com/apache/airflow/archive/main.tar.gz #48 4.745 + tar xz -C /tmp/tmp.0VWy67lki1 --strip 1 #48 6.344 + uv pip install --python /usr/local/bin/python --editable '/tmp/tmp.0VWy67lki1[devel-ci]' ``` If you look closely - it downloads airlfow from `main` and then uses it to install it locally. This is a "smart" way of caching - we are downloading the "main" version so that we can pre-install packages without invalidating docker image when dependencies of airflow change. The way how docker layers work makes it difficult to cache dependencies from Python - because you first need to copy the dependency specifications (pyproject.toml, hatch_build.py, provider.yaml files) or airflow sources to the image in order to perform installation: As an example (simplified): ``` 1# COPY pyproject.toml src . 2# uv pip install . ``` The thing is that when you copy `pyproject.toml` the `1#` layer gets invalidated (which also invalidates layer `2#` - and it means that EVERY TIME `pyproject.toml` changes, we need to install whole airflow installation from the scratch (becuase the `2#` layer has "installed airflow" and it gets invalidated. There are various strategies to cope with it - most of them can use `pip` or `uv` cache, or using cache mounts: https://docs.docker.com/build/cache/optimize/#use-cache-mounts for local builds. But airflow is a BEAST. The uv cache almost doubles the size of our image (2GB -> 4GB) because the `uv` cache is huge and is not optimized for size but for speed. The "cache mounts" only works for local builds and it takes ~6 minutes or so (less than `pip` but still substantial) to install airflow for the first time locally in the cache - also such local cache has some edge cases when it needs to be invalidated etc. Instead we are using remote cache https://docs.docker.com/build/cache/optimize/#use-an-external-cache - basically our images, when they build locally use `--cache-from`) - and our CI builds and uploads cache to ghcr.io (with --cache-to). This way the cache is refreshed every time `main` is green, and anyone who builds breeze image locally will use that cache. And the "download main archive + install it" - will generally prepare a `base` installation. This layer is not invalidated for quite some time (usually it will be when a new python base image is released, or apt-dependencies are changed). But until then it provides a "base" cache - layer - and then it is not invalidated after pyproject.toml is added: ``` 1# curl -fsSL https://github.com/apache/airflow/archive/main.tar.gz && tar xz -C /tmp/tmp.0VWy67lki1 --strip 1 && uv pip install --python /usr/local/bin/python --editable '/tmp/tmp.0VWy67lki1[devel-ci]' 2# COPY pyproject.toml src . 3# uv pip install --python /usr/local/bin/python --editable . ``` In this scenario: * The `1#` layer gets refreshed every few weeks -> when python base image changes. It does not get invalidated when pyproject.toml or src changes. This layer is pulled (rather quickly comparing to installation) from ghcr.io when `--cache-from` is used during `breeze ci-image build` * The `2#` layer gets invalidated when `pyproject.toml` or `src` change (also `3#` is invalidated as it follows `2#`. Then `uv pip install` already has **most** packages are installed already in `1#` - so `uv pip install` generally will only incrementally install whatever changed in `pyproject.toml` (and `hatch_build.py` and `provider.yaml` in our case. This means that in most cases rebuilding images is < 1 minute (and in some cases under 20 seconds) when sources or pyproject.toml changes. This saves enormous build time for CI and wait time for developers using breeze. That's why currently this step installs still `asgiref` from main. But this will change once we merge this change to main (and we will again install things from main. Now - I think the caching is currently slightly broken after the "providers" move (that's why you see it in the first place) - I am going to take a look at it shortly https://github.com/apache/airflow/issues/42999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org