Hi, Thanks a lot for sharing these.
I am looking through the tests that we run, and how we run them, as I would really like to take a hit at it. However, I can't commit to this without some agreement. I took a hard look at archery and most of our builds, and these are my observations: * we heavily rely on docker image caches to store artifacts, by using the cache action on `.docker`, as well as pushing images to a registry and fetching them. * we use a docker-compose file to enumerate all our builds (currently +1k LOC) * we use a custom made Python package (archery) for an heterogeneous set of tasks (release, merge PRs, run docker-compose, run docker) Let's evaluate the execution path of one of our major runs, the integration tests that run on every push: 1. build is triggered by the workflow `integration.yaml` on every push, and on every change to every implementation 2. this installs Python, archery, docker-compose and runs `archery docker run conda-integration` 3. this calls the equivalent of `docker-compose run conda-integration` 4. this: 4.1 builds a docker image `conda-integration.dockerfile` that contains Python, conda, Archery, Go, maven, rust, node, all of these installed via conda 4.2 uses this image to build every implementation (using `docker run CMD_TO_BUILD_ALL`) 5. runs all integration tests Step 1-4 takes 20-35m and step 5 takes 5m, irrespectively of any code changed. IMO there is a potential for a major improvement here. Some opinionated observations that demotivated me from progressing: 1. The current setup tightly couples the build of all implementations, making it difficult to refactor and simplify. I.e. We have one docker image to build all implementations, and build them all in a single command 2. we use conda to install dependencies such as maven, node, jdk and go 3. We use Python/archery for almost everything, even when a simpler `docker-compose run X` would suffice With this said, I have two changes to the current design that I would like to work, if there is buy-in for the general ideas: 1. make every artifact an independent build. The integration test can be broadly described by a DAG with the following link list: cpp artifacts <- test result js artifacts <- test result go artifacts <- test result rust artifacts <- test result ... My suggestion is that instead of running 1 job that builds all of these artifacts at once + the test execution, we use N+1 jobs that build each artifact independently and the job "test" picks these (cached via the cache flow) and runs the actual test. This segmentation will allow us to cache the artifacts when code does not change, which will significantly improve the aforementioned performance issue. 2. Make every build environment dedicated to what it is being built I.e. Instead of preparing 1 docker image that builds all of these artifacts at once, we prepare N docker images that build each of the artifacts. I.e. use a docker image to build rust, one to build go, one to build c++, etc. This eliminates the tight coupling that currently exists between building these implementations. Note that I see this as a stop gap. IMO we should use the artifacts built from each individual implementation on the integration tests by sharing the artifacts (and not even run integration tests if the artifact cannot be produced / compilation error) instead of building them twice. Any thoughts? Best, Jorge On Mon, Nov 23, 2020 at 9:58 PM Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote: > On Mon, Nov 23, 2020 at 3:38 PM Antoine Pitrou <anto...@python.org> wrote: > > > > > > Hello, > > > > (sorry, disregard the previous e-mail, I pressed the Send button too > early) > > > > The folks at the apache-builds mailing-list gathered some statistics > > about GHA usage of various Apache projects: > > > > > https://mail-archives.apache.org/mod_mbox/www-builds/202011.mbox/%3CCADe6CU_a5_HhGNFNGGYwfCdJR0-yPxOuAwnKxaPRvnOOPp86sA%40mail.gmail.com%3E > > > > > https://docs.google.com/spreadsheets/d/1SE9HIHBPmTZuW1WAgdVbEcGouGesiyrnXDIZxx25RSE/edit#gid=0 > > > > It seems Arrow is the third biggest consumer of Apache GHA CI resources, > > if measured by median number of in-progress workflow runs. > > (I'm not sure whether this measures individual jobs, or if several jobs > > are counted as a single workflow, given that GHA has a rather bizarre > model) > Thanks for the heads up! > > We have a high queued max value because of the post-release mass PR > rebase script which distorts the average values as well. > Based on the medians I don't think that we extremely overuse our GHA > capacity portion. > > On the other hand we can remove a couple of low priority builds (or > schedule them as nightlies). > > Regards, Krisztian > > > > Regards > > > > Antoine. > > >