potiuk commented on issue #42999: URL: https://github.com/apache/airflow/issues/42999#issuecomment-2416058277
cc: @ashb @kaxil @gopidesupavan @romsharon98 @shahar1 @eladkal @jscheffl. I am slowly looking at the ways how we can again speed up the CI image build and caching. One of the reasons I **thought** caching is not as good is the UV version bump. Whenever we increase the UV version it will invalidate the cache layer **just** before installing airflow from the branch tip. Starting from here https://github.com/apache/airflow/blob/main/Dockerfile.ci#L1281 ``` ARG AIRFLOW_PIP_VERSION=24.2 ARG AIRFLOW_UV_VERSION=0.4.22 ARG AIRFLOW_USE_UV="true" RUN echo "Airflow version: ${AIRFLOW_VERSION}" # Copy all scripts required for installation - changing any of those should lead to # rebuilding from here COPY --from=scripts install_packaging_tools.sh install_airflow_dependencies_from_branch_tip.sh \ common.sh /scripts/docker/ RUN bash /scripts/docker/install_packaging_tools.sh; \ if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" ]]; then \ bash /scripts/docker/install_airflow_dependencies_from_branch_tip.sh; \ fi .... When we change default UV_VERSION - all the layers below that get invalidated - so the COPY that copies the script and installing dependencies from the `main` branch tip will reinstall whole airlfow. Even with `uv` where it is as fast as possible, it takes 180s on my machine (3 minutes) - which is pretty slow. And the only reason in this case is that we changed the UV version. I think about optimizing it a bit - since UV version is changing so fast - maybe we can optimize it by installing "latest" version of UV for the "branch tip" installation and only then install the "fixed" version of UV. We are fixing UV version in order to get stability in place - there is a risk that UV upgrade will break things (happened in the past) as they are "moving fast and break things". So fixing UV in the image (and manual updates and merging after PR passes) makes sense - but for the optimisation purpose, it might make sense that branch tip installation is happening with latest UV without specifying version. This should be generally quite save and stable. The "install_airlfow_from_branch_tip" is only optimization of installation - preinstalling airflow from "some" good version of airflow. If we do not change UV_VERSION before it, the layer would usually not get invalidated for builds - because there is nothing that could trigger the invalidation (no other previous Dockerfile lines would change) - so once such a build succeeds in main and the remote cache is updated, it will NOT reinstall uv with latest version - it will keep the uv installed in the same version it succeed last time - because the cache will not get invalidated. So the first time when the cache will be build it will roughly work this way: 1) install latest UV version 2) install airlfow from the latest main branch (this takes 3 minutes) 3) fix UV to the version (say 0.4.22) and install it 4) COPY airflow sources 5) proceed with installing airlfow from sources (incremental) Then whenever we have modified airflow sources - for all the subsequent image build operations, if base python version and no Dockerfile above those line change and no UV version change - it will reuse the cache effectively: 1) -> from cache 2) -> from cache 3) -> from cache 4) COPY airflow sources 5) proceed with installing airlfow from sources (incremental) If we update UV to 0.4.23 for example 1) -> from cache 2) -> from cache (this is where we save 3 minutes) 3) fix UV to the version (say 0.4.23) and install it 4) COPY airflow sources 5) proceed with installing airlfow from sources (incremental) This means that we are saving 3 minutes for rebuilds of CI image (locally and on CI) - when only UV version change. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org