Thanks for taking the time to explain David! That makes sense I also saw your KIP about using the merge queue so it sounds like we have a plan here
On Wed, Jan 22, 2025 at 6:07 PM David Arthur <mum...@gmail.com> wrote: > It's a good question, Sophie. The answer is not entirely obvious. > > The build cache basically just loads the latest cache produced by the trunk > builds. However, there are un-built commits on trunk quite frequently. > These are commits which are being built by the current CI run, or are > waiting for the next CI run to kick off. We only run a single trunk build > at once, so having un-built commits on trunk is quite common. > > The build cache is updated _after_ a trunk CI finishes. > > Because of this, the latest build cache is usually lagging behind trunk. If > we build the PRs using the merge ref, it will usually include un-cached / > un-built commits off of trunk. The result is a lot of cache misses. > > By building the PR's HEAD, we can avoid this problem by merging in an > earlier point in trunk which only includes commits which have been built > and cached. > > E.g., my normal workflow when updating PRs is: > > git fetch origin > ./committer-tools/update-cache.sh > git merge trunk-cached > > This _should_ result in a lot of cache hits for Gradle tasks, but does not > because of the merge ref thing. > > Also, just a general note, if there are no file changes between what you've > loaded from the build cache and your PR -- Gradle should skip that task > (you'll see FROM-CACHE in the output). > > Hope this helps! > -David A > > On Wed, Jan 22, 2025 at 8:42 PM Sophie Blee-Goldman <sop...@responsive.dev > > > wrote: > > > > > > > This change will help with test time since it will increase cache hits > > for > > > the Gradle test tasks. For example, if your PR only changes streams > code > > > you should ideally not see other modules get re-tested (unless they are > > > downstream dependencies of :streams). > > > > > > Ok thanks for clarifying. But in that case (sorry in advance if this is a > > dumb question) I'm not understanding why the trunk-merge causes worse > build > > cache usage. If it's "smart" enough to compare which modules are actually > > changed and avoid re-testing those, then if your PR 1 doesn't touch some > > module A and there is some other PR 2 which also didn't touch that module > > A, why can't we use the build cache from PR 2? And why does running the > > tests from the head commit of those PRs increase the build cache hit rate > > -- if anything those two PRs are now *less* alike, rather than more > alike, > > right? So isn't there a better chance of having a recent version of the > > build for some module that neither PR touched if they are both rebased > > against trunk, versus PR 1 being against the latest trunk and PR 2 being > > from 10 commits ago, since there's a higher chance those last 10 commits > > touched a wider range of modules? > > > > Hope that question makes sense. Probably I'm just ignorant of how the > build > > cache works and what it can/can't do, so thanks in advance for bearing > with > > me here. > > > > -Sophie > > > > On Wed, Jan 22, 2025 at 4:40 PM David Arthur <mum...@gmail.com> wrote: > > > > > This change will help with test time since it will increase cache hits > > for > > > the Gradle test tasks. For example, if your PR only changes streams > code > > > you should ideally not see other modules get re-tested (unless they are > > > downstream dependencies of :streams). > > > > > > I've learned from ASF Infra that merge queues will be available soon. > In > > > light of this, I'm going to expand this proposal into a KIP that > includes > > > merge queues. Stay tuned > > > > > > -David A > > > > > > On Wed, Jan 22, 2025 at 6:37 PM Sophie Blee-Goldman < > > sop...@responsive.dev > > > > > > > wrote: > > > > > > > David, just to make sure I understand, are you saying this will help > > with > > > > the actual test runtime, or only the compile time of the build? Just > > > > wondering because I've personally never found the compile times to > be a > > > > roadblock, since the test times are so long. If this change only > speeds > > > up > > > > compiling then it's hardly going to be noticeable imho > > > > > > > > -Sophie > > > > > > > > On Fri, Jan 17, 2025 at 5:16 PM David Arthur <mum...@gmail.com> > wrote: > > > > > > > > > Thanks for raising the correctness aspect, José. We currently have > a > > > > > compromise solution since PRs are not required to be up-to-date > with > > > > trunk > > > > > before merging. Basically, as long as a PR does not have conflicts > > and > > > > has > > > > > passing tests, it can be merged. That's the situation today. We > rely > > on > > > > the > > > > > trunk builds to retroactively inform us about problems. > > > > > > > > > > The merge queue is probably the solution to your concern. This > would > > > > allow > > > > > us to guarantee that each PR has run the tests with the latest code > > on > > > > > trunk one at a time. We still can't turn this feature on, but I'm > > > hopeful > > > > > it will be available to us soon. > > > > > > > > > > For this PR, I have kept the "Compile and Check Java" step building > > the > > > > > merge ref (trunk + PR). This will retain some of the previous > > behavior > > > > and > > > > > at least help prevent breaking trunk due to compilation errors. > > > > > > > > > > In order to benefit from the build cache, contributors will need to > > > pull > > > > in > > > > > trunk often. I'm hoping this is a good incentive to keep the branch > > > > > updated. > > > > > > > > > > > I never found that the limiting factor in merging a PR to Apache > > > Kafka > > > > is > > > > > the build times. > > > > > > > > > > I have :) > > > > > > > > > > It really depends on the nature of the change. Right now, the > > community > > > > is > > > > > driving a lot of small cleanup/removal PRs for 4.0 which are easy > to > > > > review > > > > > but take time to test and build (and re-build). > > > > > > > > > > -David A > > > > > > > > > > On Fri, Jan 17, 2025 at 4:30 PM José Armando García Sancio > > > > > <jsan...@confluent.io.invalid> wrote: > > > > > > > > > > > Thanks David. > > > > > > > > > > > > On Fri, Jan 17, 2025 at 11:55 AM David Arthur <mum...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > I have a simple patch which changes our PRs to build the HEAD > of > > > the > > > > PR > > > > > > > rather than the merge ref. > > > > > > > > > > > > > > https://github.com/apache/kafka/pull/18449 > > > > > > > > > > > > We shouldn't sacrifice correctness for performance. The tests > need > > to > > > > > > run against what has been committed to the trunk branch. > Otherwise > > > the > > > > > > onus is on the contributor and committer to make sure that the PR > > > > > > branch is up to date enough. > > > > > > > > > > > > I never found that the limiting factor in merging a PR to Apache > > > Kafka > > > > > > is the build times. The limiting factor is committer review time. > > > > > > > > > > > > Thanks, > > > > > > -- > > > > > > -José > > > > > > > > > > > > > > > > > > > > > -- > > > > > David Arthur > > > > > > > > > > > > > > > > > > -- > > > David Arthur > > > > > > > > -- > David Arthur >