On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote: > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Krisztian, > > > > Before talking about any code donations or where to run builds, I > > think we first need to discuss the worrisome situation where we have > > in some cases 3 (or more) CI configurations for different components > > in the project. > > > > Just taking into account out C++ build, we have: > > > > * A config for Travis CI > > * Multiple configurations in Dockerfiles under cpp/ > > * A brand new (?) configuration in this third party ursa-labs/ursabot > > repository > > > > I note for example that the "AMD64 Conda C++" Buildbot build is > > failing while Travis CI is succeeding > > > > https://ci.ursalabs.org/#builders/66/builds/3196 > > > > Starting from first principles, at least for Linux-based builds, what > > I would like to see is: > > > > * A single build configuration (which can be driven by yaml-based > > configuration files and environment variables), rather than 3 like we > > have now. This build configuration should be decoupled from any CI > > platform, including Travis CI and Buildbot > > > Yeah, this would be the ideal setup, but I'm afraid the situation is a bit > more complicated. > > TravisCI > -------- > > constructed from a bunch of scripts optimized for travis, this setup is > slow > and hardly compatible with any of the remaining setups. > I think we should ditch it. > > The "docker-compose setup" > -------------------------- > > Most of the Dockerfiles are part of the docker-compose setup we've > developed. > This might be a good candidate as the tool to centralize around our future > setup, mostly because docker-compose is widely used, and we could setup > buildbot builders (or any other CI's) to execute the sequence of > docker-compose > build and docker-compose run commands. > However docker-compose is not suitable for building and running > hierarchical > images. This is why we have added Makefile [1] to execute a "build" with a > single make command instead of manually executing multiple commands > involving > multiple images (which is error prone). It can also leave a lot of garbage > after both containers and images. > Docker-compose shines when one needs to orchestrate multiple containers and > their networks / volumes on the same machine. We made it work (with a > couple of > hacky workarounds) for arrow though. > Despite that, I still consider the docker-compose setup a good solution, > mostly because its biggest advantage, the local reproducibility. >
I think what is missing here is an orchestration tool (for example, a Python program) to invoke Docker-based development workflows involving multiple steps. > Ursabot > ------- > > Ursabot uses low level docker commands to spin up and down the containers > and > it also has a utility to nicely build the hierarchical images (with much > less > maintainable code). The builders are reliable, fast (thanks to docker) and > it's > great so far. > Where it falls short compared to docker-compose is the lack of the local > reproducibility, currently the docker worker cleans up everything after it > except the mounted volumes for caching. `docker-compose run` is a pretty > nice > way to shell into the container. > > Use docker-compose from ursabot? > -------------------------------- > > So assume that we should use docker-compose commands in the buildbot > builders. > Then: > - there would be a single build step for all builders [2] (which means a > single chunk of unreadable log) - it also complicates working with > esoteric I think this is too much of a black-and-white way of looking at things. What I would like to see is a build orchestration tool, which can be used via command line interface, not unlike the current crossbow.py and archery command line scripts, that can invoke a build locally or in a CI setting. > builders like the on-demand crossbow trigger and the benchmark runner > - no possibility to customize the buildsteps (like aggregating the count of > warnings) > - no time statistics for the steps which would make it harder to optimize > the > build times > - to properly clean up the container some custom solution would be required > - if we'd need to introduce additional parametrizations to the > docker-compose.yaml (for example to add other architectures) then it might > require full yaml duplication I think the tool would need to be higher level than docker-compose In general I'm not very comfortable introducing a hard dependency on Buildbot (or any CI platform, for that matter) into the project. So we have to figure out a way to move forward without such hard dependency or go back to the drawing board. > - exchanging data between the docker-compose container and builtbot would be > more complicated, for example the benchmark comment reporter reads > the result from a file, in order to do the same (reading structured > output on > stdout and stderr from scripts is more error prone) mounted volumes are > required, which brings the usual permission problems on linux. > - local reproducibility still requires manual intervention because the > scripts > within the docker containers are not pausable, they exit and the steps > until > the failed one must be re-executed* after ssh-ing into the running > container. > > Honestly I see more issues than advantages here. Let's see the other way > around. > > Local reproducibility with ursabot? > ----------------------------------- > > The most wanted feature what docker-compose has but ursabot doesn't is the > local reproducibility. First of all, ursabot can be run locally, including > all > if its builders, so the local reproducibility is partially resolved. The > missing piece is the interactive shell into the running container, because > buildbot instantly stops and aggressively clean up everything after the > container. > > I have three solutions / workarounds in mind: > > 1. We have all the power of docker and docker-compose from ursabot through > docker-py, and we can easily keep the container running by simply not > stopping it [3]. Configuring the locally running buildbot to keep the > containers running after a failure seems quite easy. *It has the > advantage > that all of the buildsteps preceding one are already executed, so it > requires less manual intervention. > This could be done on the web UI or even from the CLI, like > `ursabot reproduce <builder-name>` > 2. Generate the docker-compose.yaml and required scripts from the Ursabot > builder configurations, including the shell scripts. > 3. Generate a set of commands to reproduce the failure without (even asking > the comment bot "how to reproduce the failing one"). The response would > look similar to: > ```bash > $ docker pull <image> > $ docker run -it <image> bash > # cmd1 > # cmd2 > # <- error occurs here -> > ``` > > TL;DR > ----- > In the first iteration I'd remove the travis configurations. > In the second iteration I'd develop a feature for ursabot to make local > reproducibility possible. > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker > [2]: https://ci.ursalabs.org/#/builders/87/builds/929 > [3]: > https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343 > > * A local tool to run any Linux-based builds locally using Docker at > > the command line, so that CI behavior can be exactly reproduced > > locally > > > > Does that seem achievable? > > > Thanks, > > Wes > > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs > > <szucs.kriszt...@gmail.com> wrote: > > > > > > Hi All, > > > > > > Ursabot works pretty well so far, and the CI feedback times have become > > > even better* after enabling the docker volume caches, the development > > > and maintenance of it is still not available for the whole Arrow > > community. > > > > > > While it wasn't straightforward I've managed to separate to source code > > > required to configure the Arrow builders into a separate directory, which > > > eventually can be donated to Arrow. > > > The README is under construction, but the code is available here [1]. > > > > > > Until this codebase is not governed by the Arrow community, > > > decommissioning slow travis builds is not possible, so the overall CI > > times > > > required to merge a PR will remain high. > > > > > > Regards, Krisztian > > > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes > > > * Python builder times have dropped from ~7-8 minutes to ~3-5 minutes > > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12 minutes > > > > > > [1]: > > > > > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow > >