Thanks for taking the time to explain David! That makes sense

I also saw your KIP about using the merge queue so it sounds like we have a
plan here

On Wed, Jan 22, 2025 at 6:07 PM David Arthur <mum...@gmail.com> wrote:

> It's a good question, Sophie. The answer is not entirely obvious.
>
> The build cache basically just loads the latest cache produced by the trunk
> builds. However, there are un-built commits on trunk quite frequently.
> These are commits which are being built by the current CI run, or are
> waiting for the next CI run to kick off. We only run a single trunk build
> at once, so having un-built commits on trunk is quite common.
>
> The build cache is updated _after_ a trunk CI finishes.
>
> Because of this, the latest build cache is usually lagging behind trunk. If
> we build the PRs using the merge ref, it will usually include un-cached /
> un-built commits off of trunk. The result is a lot of cache misses.
>
> By building the PR's HEAD, we can avoid this problem by merging in an
> earlier point in trunk which only includes commits which have been built
> and cached.
>
> E.g., my normal workflow when updating PRs is:
>
> git fetch origin
> ./committer-tools/update-cache.sh
> git merge trunk-cached
>
> This _should_ result in a lot of cache hits for Gradle tasks, but does not
> because of the merge ref thing.
>
> Also, just a general note, if there are no file changes between what you've
> loaded from the build cache and your PR -- Gradle should skip that task
> (you'll see FROM-CACHE in the output).
>
> Hope this helps!
> -David A
>
> On Wed, Jan 22, 2025 at 8:42 PM Sophie Blee-Goldman <sop...@responsive.dev
> >
> wrote:
>
> > >
> > > This change will help with test time since it will increase cache hits
> > for
> > > the Gradle test tasks. For example, if your PR only changes streams
> code
> > > you should ideally not see other modules get re-tested (unless they are
> > > downstream dependencies of :streams).
> >
> >
> > Ok thanks for clarifying. But in that case (sorry in advance if this is a
> > dumb question) I'm not understanding why the trunk-merge causes worse
> build
> > cache usage. If it's "smart" enough to compare which modules are actually
> > changed and avoid re-testing those, then if your PR 1 doesn't touch some
> > module A and there is some other PR 2 which also didn't touch that module
> > A, why can't we use the build cache from PR 2? And why does running the
> > tests from the head commit of those PRs increase the build cache hit rate
> > -- if anything those two PRs are now *less* alike, rather than more
> alike,
> > right? So isn't there a better chance of having a recent version of the
> > build for some module that neither PR touched if they are both rebased
> > against trunk, versus PR 1 being against the latest trunk and PR 2 being
> > from 10 commits ago, since there's a higher chance those last 10 commits
> > touched a wider range of modules?
> >
> > Hope that question makes sense. Probably I'm just ignorant of how the
> build
> > cache works and what it can/can't do, so thanks in advance for bearing
> with
> > me here.
> >
> > -Sophie
> >
> > On Wed, Jan 22, 2025 at 4:40 PM David Arthur <mum...@gmail.com> wrote:
> >
> > > This change will help with test time since it will increase cache hits
> > for
> > > the Gradle test tasks. For example, if your PR only changes streams
> code
> > > you should ideally not see other modules get re-tested (unless they are
> > > downstream dependencies of :streams).
> > >
> > > I've learned from ASF Infra that merge queues will be available soon.
> In
> > > light of this, I'm going to expand this proposal into a KIP that
> includes
> > > merge queues. Stay tuned
> > >
> > > -David A
> > >
> > > On Wed, Jan 22, 2025 at 6:37 PM Sophie Blee-Goldman <
> > sop...@responsive.dev
> > > >
> > > wrote:
> > >
> > > > David, just to make sure I understand, are you saying this will help
> > with
> > > > the actual test runtime, or only the compile time of the build? Just
> > > > wondering because I've personally never found the compile times to
> be a
> > > > roadblock, since the test times are so long. If this change only
> speeds
> > > up
> > > > compiling then it's hardly going to be noticeable imho
> > > >
> > > > -Sophie
> > > >
> > > > On Fri, Jan 17, 2025 at 5:16 PM David Arthur <mum...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for raising the correctness aspect, José. We currently have
> a
> > > > > compromise solution since PRs are not required to be up-to-date
> with
> > > > trunk
> > > > > before merging. Basically, as long as a PR does not have conflicts
> > and
> > > > has
> > > > > passing tests, it can be merged. That's the situation today. We
> rely
> > on
> > > > the
> > > > > trunk builds to retroactively inform us about problems.
> > > > >
> > > > > The merge queue is probably the solution to your concern. This
> would
> > > > allow
> > > > > us to guarantee that each PR has run the tests with the latest code
> > on
> > > > > trunk one at a time. We still can't turn this feature on, but I'm
> > > hopeful
> > > > > it will be available to us soon.
> > > > >
> > > > > For this PR, I have kept the "Compile and Check Java" step building
> > the
> > > > > merge ref (trunk + PR). This will retain some of the previous
> > behavior
> > > > and
> > > > > at least help prevent breaking trunk due to compilation errors.
> > > > >
> > > > > In order to benefit from the build cache, contributors will need to
> > > pull
> > > > in
> > > > > trunk often. I'm hoping this is a good incentive to keep the branch
> > > > > updated.
> > > > >
> > > > > > I never found that the limiting factor in merging a PR to Apache
> > > Kafka
> > > > is
> > > > > the build times.
> > > > >
> > > > > I have :)
> > > > >
> > > > > It really depends on the nature of the change. Right now, the
> > community
> > > > is
> > > > > driving a lot of small cleanup/removal PRs for 4.0 which are easy
> to
> > > > review
> > > > > but take time to test and build (and re-build).
> > > > >
> > > > > -David A
> > > > >
> > > > > On Fri, Jan 17, 2025 at 4:30 PM José Armando García Sancio
> > > > > <jsan...@confluent.io.invalid> wrote:
> > > > >
> > > > > > Thanks David.
> > > > > >
> > > > > > On Fri, Jan 17, 2025 at 11:55 AM David Arthur <mum...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > I have a simple patch which changes our PRs to build the HEAD
> of
> > > the
> > > > PR
> > > > > > > rather than the merge ref.
> > > > > > >
> > > > > > > https://github.com/apache/kafka/pull/18449
> > > > > >
> > > > > > We shouldn't sacrifice correctness for performance. The tests
> need
> > to
> > > > > > run against what has been committed to the trunk branch.
> Otherwise
> > > the
> > > > > > onus is on the contributor and committer to make sure that the PR
> > > > > > branch is up to date enough.
> > > > > >
> > > > > > I never found that the limiting factor in merging a PR to Apache
> > > Kafka
> > > > > > is the build times. The limiting factor is committer review time.
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > -José
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > David Arthur
> > > > >
> > > >
> > >
> > >
> > > --
> > > David Arthur
> > >
> >
>
>
> --
> David Arthur
>

Reply via email to