potiuk commented on issue #42999:
URL: https://github.com/apache/airflow/issues/42999#issuecomment-2416058277

   cc: @ashb @kaxil @gopidesupavan @romsharon98 @shahar1 @eladkal @jscheffl. 
   
   I am slowly looking at the ways how we can again speed up the CI image build 
and caching.
   One of the reasons I **thought** caching is not as good is the UV version 
bump. Whenever we increase the UV version it will invalidate the cache layer 
**just** before installing airflow from the branch tip.
   
   Starting from here 
https://github.com/apache/airflow/blob/main/Dockerfile.ci#L1281
   
   ```
   ARG AIRFLOW_PIP_VERSION=24.2
   ARG AIRFLOW_UV_VERSION=0.4.22
   ARG AIRFLOW_USE_UV="true"
   
   RUN echo "Airflow version: ${AIRFLOW_VERSION}"
   
   # Copy all scripts required for installation - changing any of those should 
lead to
   # rebuilding from here
   COPY --from=scripts install_packaging_tools.sh 
install_airflow_dependencies_from_branch_tip.sh \
       common.sh /scripts/docker/
   RUN bash /scripts/docker/install_packaging_tools.sh; \
       if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" ]]; then \
           bash 
/scripts/docker/install_airflow_dependencies_from_branch_tip.sh; \
       fi
   ....
   
   When we change default UV_VERSION - all the layers below that get 
invalidated - so the COPY that copies the script and installing dependencies 
from the `main` branch tip will reinstall whole airlfow. Even with `uv` where 
it is as fast as possible, it takes 180s on my machine (3 minutes) - which is 
pretty slow. And the only reason in this case is that we changed the UV version.
   
   I think about optimizing it a bit - since UV version is changing so fast - 
maybe we can optimize it  by installing "latest" version of UV for the "branch 
tip" installation and only then install the "fixed" version of UV.
   
   We are fixing UV version in order to get stability in place - there is a 
risk that UV upgrade will break things (happened in the past) as they are 
"moving fast and break things". So fixing UV in the image (and manual updates 
and merging after PR passes) makes sense - but for the optimisation purpose, it 
might make sense that branch tip installation is happening with latest UV 
without specifying version.
   
   This should be generally quite save and stable. The 
"install_airlfow_from_branch_tip" is only optimization of installation - 
preinstalling airflow from "some" good version of airflow. If we do not change 
UV_VERSION before it, the layer would usually not get invalidated for builds - 
because there is nothing that could trigger the invalidation (no other previous 
Dockerfile lines would change) - so once such a build succeeds in main and the 
remote cache is updated, it will NOT reinstall uv with latest version - it will 
keep the uv installed in the same version it succeed last time - because the 
cache will not get invalidated.
   
   So the first time when the cache will be build it will roughly work this way:
   
   1) install latest UV version
   2) install airlfow from the latest main branch (this takes 3 minutes)
   3) fix UV to the version (say 0.4.22) and install it
   4) COPY airflow sources
   5) proceed with installing airlfow from sources (incremental)
    
   Then whenever we have modified airflow sources - for all the subsequent 
image build operations, if base python version and no Dockerfile above those 
line change and no UV version change - it will reuse the cache effectively:
   
   1) -> from cache
   2) -> from cache 
   3) -> from cache
   4) COPY airflow sources
   5) proceed with installing airlfow from sources (incremental)
   
   If we update UV to 0.4.23 for example
   
   1) -> from cache
   2) -> from cache (this is where we save 3 minutes) 
   3) fix UV to the version (say 0.4.23) and install it
   4) COPY airflow sources
   5) proceed with installing airlfow from sources (incremental)
   
   This means that we are saving 3 minutes for rebuilds of CI image (locally 
and on CI) - when only UV version change.
   
   WDYT? 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to