It's a good question, Sophie. The answer is not entirely obvious. The build cache basically just loads the latest cache produced by the trunk builds. However, there are un-built commits on trunk quite frequently. These are commits which are being built by the current CI run, or are waiting for the next CI run to kick off. We only run a single trunk build at once, so having un-built commits on trunk is quite common.
The build cache is updated _after_ a trunk CI finishes. Because of this, the latest build cache is usually lagging behind trunk. If we build the PRs using the merge ref, it will usually include un-cached / un-built commits off of trunk. The result is a lot of cache misses. By building the PR's HEAD, we can avoid this problem by merging in an earlier point in trunk which only includes commits which have been built and cached. E.g., my normal workflow when updating PRs is: git fetch origin ./committer-tools/update-cache.sh git merge trunk-cached This _should_ result in a lot of cache hits for Gradle tasks, but does not because of the merge ref thing. Also, just a general note, if there are no file changes between what you've loaded from the build cache and your PR -- Gradle should skip that task (you'll see FROM-CACHE in the output). Hope this helps! -David A On Wed, Jan 22, 2025 at 8:42 PM Sophie Blee-Goldman <sop...@responsive.dev> wrote: > > > > This change will help with test time since it will increase cache hits > for > > the Gradle test tasks. For example, if your PR only changes streams code > > you should ideally not see other modules get re-tested (unless they are > > downstream dependencies of :streams). > > > Ok thanks for clarifying. But in that case (sorry in advance if this is a > dumb question) I'm not understanding why the trunk-merge causes worse build > cache usage. If it's "smart" enough to compare which modules are actually > changed and avoid re-testing those, then if your PR 1 doesn't touch some > module A and there is some other PR 2 which also didn't touch that module > A, why can't we use the build cache from PR 2? And why does running the > tests from the head commit of those PRs increase the build cache hit rate > -- if anything those two PRs are now *less* alike, rather than more alike, > right? So isn't there a better chance of having a recent version of the > build for some module that neither PR touched if they are both rebased > against trunk, versus PR 1 being against the latest trunk and PR 2 being > from 10 commits ago, since there's a higher chance those last 10 commits > touched a wider range of modules? > > Hope that question makes sense. Probably I'm just ignorant of how the build > cache works and what it can/can't do, so thanks in advance for bearing with > me here. > > -Sophie > > On Wed, Jan 22, 2025 at 4:40 PM David Arthur <mum...@gmail.com> wrote: > > > This change will help with test time since it will increase cache hits > for > > the Gradle test tasks. For example, if your PR only changes streams code > > you should ideally not see other modules get re-tested (unless they are > > downstream dependencies of :streams). > > > > I've learned from ASF Infra that merge queues will be available soon. In > > light of this, I'm going to expand this proposal into a KIP that includes > > merge queues. Stay tuned > > > > -David A > > > > On Wed, Jan 22, 2025 at 6:37 PM Sophie Blee-Goldman < > sop...@responsive.dev > > > > > wrote: > > > > > David, just to make sure I understand, are you saying this will help > with > > > the actual test runtime, or only the compile time of the build? Just > > > wondering because I've personally never found the compile times to be a > > > roadblock, since the test times are so long. If this change only speeds > > up > > > compiling then it's hardly going to be noticeable imho > > > > > > -Sophie > > > > > > On Fri, Jan 17, 2025 at 5:16 PM David Arthur <mum...@gmail.com> wrote: > > > > > > > Thanks for raising the correctness aspect, José. We currently have a > > > > compromise solution since PRs are not required to be up-to-date with > > > trunk > > > > before merging. Basically, as long as a PR does not have conflicts > and > > > has > > > > passing tests, it can be merged. That's the situation today. We rely > on > > > the > > > > trunk builds to retroactively inform us about problems. > > > > > > > > The merge queue is probably the solution to your concern. This would > > > allow > > > > us to guarantee that each PR has run the tests with the latest code > on > > > > trunk one at a time. We still can't turn this feature on, but I'm > > hopeful > > > > it will be available to us soon. > > > > > > > > For this PR, I have kept the "Compile and Check Java" step building > the > > > > merge ref (trunk + PR). This will retain some of the previous > behavior > > > and > > > > at least help prevent breaking trunk due to compilation errors. > > > > > > > > In order to benefit from the build cache, contributors will need to > > pull > > > in > > > > trunk often. I'm hoping this is a good incentive to keep the branch > > > > updated. > > > > > > > > > I never found that the limiting factor in merging a PR to Apache > > Kafka > > > is > > > > the build times. > > > > > > > > I have :) > > > > > > > > It really depends on the nature of the change. Right now, the > community > > > is > > > > driving a lot of small cleanup/removal PRs for 4.0 which are easy to > > > review > > > > but take time to test and build (and re-build). > > > > > > > > -David A > > > > > > > > On Fri, Jan 17, 2025 at 4:30 PM José Armando García Sancio > > > > <jsan...@confluent.io.invalid> wrote: > > > > > > > > > Thanks David. > > > > > > > > > > On Fri, Jan 17, 2025 at 11:55 AM David Arthur <mum...@gmail.com> > > > wrote: > > > > > > > > > > > > I have a simple patch which changes our PRs to build the HEAD of > > the > > > PR > > > > > > rather than the merge ref. > > > > > > > > > > > > https://github.com/apache/kafka/pull/18449 > > > > > > > > > > We shouldn't sacrifice correctness for performance. The tests need > to > > > > > run against what has been committed to the trunk branch. Otherwise > > the > > > > > onus is on the contributor and committer to make sure that the PR > > > > > branch is up to date enough. > > > > > > > > > > I never found that the limiting factor in merging a PR to Apache > > Kafka > > > > > is the build times. The limiting factor is committer review time. > > > > > > > > > > Thanks, > > > > > -- > > > > > -José > > > > > > > > > > > > > > > > > -- > > > > David Arthur > > > > > > > > > > > > > -- > > David Arthur > > > -- David Arthur