Thanks for the KIP, Vince and Farid! A few comments on the KIP
DA1: The motivation mentioned historical build times in Apache Kafka, but we should really just consider the current situation. I would like to see the motivation/background make a clearer distinction between trunk builds (no caching) and pull request builds (with caching). DA2: An 80% reduction (3hr -> 30m) in build times seems only possible with aggressive caching and task avoidance. We have implemented the Gradle cache for our PR builds which has only given us around a 20% reduction due to our dependency graph [1] (from 2hr to 1hr40m). This is because a change in ":clients" will result in a full rebuild of nearly everything since it is highly depended on. How does Bazel deal with a situation like this? Would a change in ":clients" only cause the ":clients" module to be rebuilt and tested? This is probably my biggest question about Bazel. How do we get faster build times without sacrificing safety? ("No free lunch" principle) DA3: > Additionally, Gradle resolves a separate dependency tree per module, meaning that final tar.gz and docker images could end up with multiple versions of the same jar, which can have unexpected results. I'm not sure how much this happens in practice. Gradle has a feature known as Platforms which helps manage transitive dependencies https://docs.gradle.org/current/userguide/platforms.html DA4: > This is a heavyweight solution that adds additional network and storage costs on the package registries, and requires moving code around to many “*-common” modules I disagree strongly with this. Restructuring the code into "-api", "-common", etc results in better organized code and a cleaner dependency graph. We only publish to Maven during release time, and those artifacts are rarely consumed (aside from the public API ones like clients/streams/connect). The number of JARs produced isn't really a problem IMO. I think restructuring the code (as we have been doing) would benefit a Bazel build in a similar way that it has been benefiting the Gradle build. DA5: > Bazel promotes fine-grained targets, allowing each Java package, or even individual files within a package, to have distinct BUILD targets To me this is a bit of an anti-pattern. In my opinion, a build target should be scoped to a module. This has been the standard way of building JVM projects for a long time. Having fine grained file level build targets harkens back to Apache Ant :) DA6: For the past few months, we have been enjoying Develocity (formerly Gradle Enterprise) as a way to gather historical build data, investigate build/test failures, and see trends in our build performance and flaky tests. It has been very valuable in the ongoing crusade against flaky tests. Does Bazel have anything like this? DA7: Not really a comment on the KIP, but more of an FYI about Gradle -- we have received code contributions and general support from Gradle Inc as part of the Gradle + ASF partnership. In addition to the free Develocity instance (ge.apache.org), we will soon have the ability to use the remote cache for developers to use locally. --- Some comments on the discussion so far: > CP0: Can Bazel help simplify build.gradle (currently around 3826 lines)? The BAZEL files in https://github.com/confluentinc/kafka-bazel total around 6500 lines > SVV3. Before switching to a completely different build system, I think we have to ask ourselves: did we exhaust any optimization in Gradle that can be done, and is it really that slower to build in it? As I mentioned above in DA2, we are now at a point where refactoring the code is the primary path to improve our build times with Gradle. Specifically, breaking up "core" and relocating those tests to appropriate sub-modules will help a lot. As it stands, ":core:test" takes up over an hour of our build times and is often required to be rebuilt in PRs due to our module dependencies. We can see that builds without many code changes are already benefiting from the cache. For example, Docs PR: https://github.com/apache/kafka/pull/17910 34m build time: https://github.com/apache/kafka/actions/runs/11972266657 --- Thanks! David A [1] https://ge.apache.org/scans/performance?search.rootProjectNames=kafka&search.tags=github%2Cnot:trunk&search.tasks=test&search.timeZoneId=America%2FNew_York On Fri, Nov 22, 2024 at 12:38 PM Vince Rose <vr...@confluent.io.invalid> wrote: > Bringing fzaka...@confluent.io into the discussion. > > On Fri, Nov 22, 2024 at 10:01 AM Viktor Somogyi-Vass > <viktor.somo...@cloudera.com.invalid> wrote: > > > Hi Vincent > > > > I would also like to say that I'm not an expert in Bazel either. > Generally > > I think it's good to reduce build and test times in every infra. While > > reading up on Bazel, it seems like it's most useful in monolithic code > > bases, such as Confluent as you mentioned in the KIP (so it's not a > > coincidence :) ). I have a few thoughts though: > > > > SVV1. That 175 vs 100 seconds in the no-cache seems to be a big > difference. > > There is a performance benchmark by Gradle ( > > https://blog.gradle.org/gradle-vs-bazel-jvm) which might be biased from > > their side, but seems to conclude different results. One thing they > mention > > in the article is that they had one BAZEL.build for a project, instead of > > the package level granularity. In your provided example, you seem to have > > it on a per module basis. Could this be the difference in performance? > Have > > you tried it out with different granularities for Bazel? > > > > SVV2. I can share the arguments in CP2 as well. Bazel would optimally > need > > a build file in every package, which you'll need to maintain manually. > This > > is an extra burden on the developer side. Based on the example you cited, > > Gradle seems to have better JVM integration compared to Bazel as the > latter > > one tries to be a generic build tool and not tuned for the JVM. I think > > that this could be a potential concern. > > > > SVV3. Before switching to a completely different build system, I think we > > have to ask ourselves: did we exhaust any optimization in Gradle that can > > be done, and is it really that slower to build in it? As the article > above > > mentioned, if Bazel build files are defined on package level, then the > > package level cyclic dependencies can be eliminated. This, however, can > be > > done with Java's module system too, which would be the idiomatic Java > way. > > If this is done, perhaps it could speed up Gradle builds as well. > > > > Best, > > Viktor > > > > On Fri, Nov 22, 2024 at 5:43 PM Gaurav Narula <ka...@gnarula.com> wrote: > > > > > Thanks for the KIP Vince and Farid! > > > > > > Glad to see Bazel being considered for Kafka and really impressive work > > on > > > the demo repository! > > > > > > From the benchmarks, I'd imagine the build would only get faster once > > > BUILD.bazel files don't glob the entire source/test set for a module > but > > > are more fine grained. I suppose it would be even faster if we were to > > use > > > RBE [0] for executing builds/tests in parallel with caching [1]. Would > be > > > nice to get a sense of the potential improvements there if you've > already > > > done some experiments and some possible RBE options that we may use? > > Maybe > > > another motivation/optimisation would be the use of target-determinator > > > [2], previously discussed in [3] for running partial tests for PRs? > > > > > > I see that you mention potential changes in IDE configurations. It > would > > > be nice to mention plugins for common IDEs that support Bazel. IIRC, > > > IntelliJ has decent Bazel support [4] but (IMHO) it's not as good as > the > > > Gradle integration. Can you perhaps discuss the potential changes in > > > developer workflows, e.g. the need to add dependencies/files to > > BUILD.bazel > > > files manually or running Gazelle to create BUILD.bazel files > > automatically? > > > > > > I see that in the example repository you declare the versions manually > > > [5]. I think it would be cumbersome to maintain two separate versions > > list > > > and perhaps we can generate the version mapping from > > > `gradle/dependencies.gradle`, unless they need to deviate for some > > reason? > > > > > > To add to some of @Chia's questions and perhaps answer them partially: > > > > > > > CP0: Can Bazel help simplify build.gradle (currently around 3826 > > lines)? > > > > > > I think some of the complexity moves to *.bzl, *.bazel files [6] [7] > [8] > > > > > > The configuration is in a language called Starlark [9] which is a > subset > > > of Python that avoids non-determinism. I'd say the configuration is > more > > > verbose than Gradle but is easier to reason about. I often find Gradle > > > plugins do a lot of magic and are harder to navigate. > > > > > > > CP1: Gradle commands are part of the public API, so in which Kafka > > > version should we announce the deprecation of Gradle commands? > > > > > > My impression is once we get rid of Gradle so Phase 5? > > > > > > > CP2: The example (https://github.com/confluentinc/kafka-bazel) seems > > to > > > modify the folder structure of the entire Kafka project. Is this change > > > required? My concern is that such a major change should be done in a > > major > > > release. If restructuring is a requirement for Bazel, does that mean we > > > have to accept it after this KIP pass? > > > > > > I stumbled upon Farid's blog post [10] which talks a bit about this and > > my > > > impression is the overlay can be merged into the main repository > without > > > changing its structure, potentially in a side directory as LLVM did > > [11]. I > > > suppose this is what Phase 2 is about? > > > > > > That said, I do think refining the build graph, which includes having > > more > > > fine grained BUILD.bazel files, may warrant moving some > classes/packages > > > around. This is because Bazel isn't happy with cycles in the build > graph > > > while Gradle allows them within a module. > > > > > > I'm looking forward to trying the overlay repository in the coming days > > > and will follow up with more questions/feedback! > > > > > > Regards, > > > Gaurav > > > > > > [0]: https://bazel.build/remote/rbe > > > [1]: https://bazel.build/remote/caching > > > [2]: https://github.com/bazel-contrib/target-determinator > > > [3]: https://lists.apache.org/thread/xqbj7ywmwf5zfycgo1ckyhl9zs43h1dd > > > [4]: https://github.com/bazelbuild/intellij > > > [5]: > > > > > > https://github.com/confluentinc/kafka-bazel/blob/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/kafka-overlay/dependencies/dependencies.bzl#L24 > > > [6]: > > > > > > https://github.com/confluentinc/kafka-bazel/tree/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/kafka-overlay/tools-bazel > > > [7]: > > > > > > https://github.com/confluentinc/kafka-bazel/tree/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/kafka-overlay/dependencies > > > [8]: > > > > > > https://github.com/confluentinc/kafka-bazel/blob/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/WORKSPACE.bazel > > > [9]: https://github.com/bazelbuild/starlark > > > [10]: https://fzakaria.com/2024/08/29/bazel-overlay-pattern.html > > > [11]: https://github.com/llvm/llvm-project/tree/main/utils/bazel > > > > > > > On 22 Nov 2024, at 03:39, Chia-Ping Tsai <chia7...@apache.org> > wrote: > > > > > > > > hi Vince > > > > > > > > Thanks for introducing this interesting KIP! I'm not an expert in > > Bazel, > > > so please be patient with my potentially odd questions: > > > > > > > > CP0: Can Bazel help simplify build.gradle (currently around 3826 > > lines)? > > > > > > > > CP1: Gradle commands are part of the public API, so in which Kafka > > > version should we announce the deprecation of Gradle commands? > > > > > > > > CP2: The example (https://github.com/confluentinc/kafka-bazel) seems > > to > > > modify the folder structure of the entire Kafka project. Is this change > > > required? My concern is that such a major change should be done in a > > major > > > release. If restructuring is a requirement for Bazel, does that mean we > > > have to accept it after this KIP pass? > > > > > > > > Best, > > > > Chia-Ping > > > > > > > > On 2024/11/21 21:16:50 Vince Rose wrote: > > > >> Hey devs, > > > >> > > > >> I want to start a discussion about introducing Bazel as a build > system > > > for > > > >> Apache Kafka. > > > >> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1115%3A+Bazel+Builds > > > >> > > > >> -Vince > > > >> > > > > > > > > > -- David Arthur