On Tue, Jul 30, 2019 at 3:23 AM Wes McKinney <wesmck...@gmail.com> wrote:
> On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs > <szucs.kriszt...@gmail.com> wrote: > > > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > hi Krisztian, > > > > > > Before talking about any code donations or where to run builds, I > > > think we first need to discuss the worrisome situation where we have > > > in some cases 3 (or more) CI configurations for different components > > > in the project. > > > > > > Just taking into account out C++ build, we have: > > > > > > * A config for Travis CI > > > * Multiple configurations in Dockerfiles under cpp/ > > > * A brand new (?) configuration in this third party ursa-labs/ursabot > > > repository > > > > > > I note for example that the "AMD64 Conda C++" Buildbot build is > > > failing while Travis CI is succeeding > > > > > > https://ci.ursalabs.org/#builders/66/builds/3196 > > > > > > Starting from first principles, at least for Linux-based builds, what > > > I would like to see is: > > > > > > * A single build configuration (which can be driven by yaml-based > > > configuration files and environment variables), rather than 3 like we > > > have now. This build configuration should be decoupled from any CI > > > platform, including Travis CI and Buildbot > > > > > Yeah, this would be the ideal setup, but I'm afraid the situation is a > bit > > more complicated. > > > > TravisCI > > -------- > > > > constructed from a bunch of scripts optimized for travis, this setup is > > slow > > and hardly compatible with any of the remaining setups. > > I think we should ditch it. > > > > The "docker-compose setup" > > -------------------------- > > > > Most of the Dockerfiles are part of the docker-compose setup we've > > developed. > > This might be a good candidate as the tool to centralize around our > future > > setup, mostly because docker-compose is widely used, and we could setup > > buildbot builders (or any other CI's) to execute the sequence of > > docker-compose > > build and docker-compose run commands. > > However docker-compose is not suitable for building and running > > hierarchical > > images. This is why we have added Makefile [1] to execute a "build" with > a > > single make command instead of manually executing multiple commands > > involving > > multiple images (which is error prone). It can also leave a lot of > garbage > > after both containers and images. > > Docker-compose shines when one needs to orchestrate multiple containers > and > > their networks / volumes on the same machine. We made it work (with a > > couple of > > hacky workarounds) for arrow though. > > Despite that, I still consider the docker-compose setup a good solution, > > mostly because its biggest advantage, the local reproducibility. > > > > I think what is missing here is an orchestration tool (for example, a > Python program) to invoke Docker-based development workflows involving > multiple steps. > > > Ursabot > > ------- > > > > Ursabot uses low level docker commands to spin up and down the containers > > and > > it also has a utility to nicely build the hierarchical images (with much > > less > > maintainable code). The builders are reliable, fast (thanks to docker) > and > > it's > > great so far. > > Where it falls short compared to docker-compose is the lack of the local > > reproducibility, currently the docker worker cleans up everything after > it > > except the mounted volumes for caching. `docker-compose run` is a pretty > > nice > > way to shell into the container. > > > > Use docker-compose from ursabot? > > -------------------------------- > > > > So assume that we should use docker-compose commands in the buildbot > > builders. > > Then: > > - there would be a single build step for all builders [2] (which means a > > single chunk of unreadable log) - it also complicates working with > > esoteric > > I think this is too much of a black-and-white way of looking at > things. What I would like to see is a build orchestration tool, which > can be used via command line interface, not unlike the current > crossbow.py and archery command line scripts, that can invoke a build > locally or in a CI setting. > This is actually something I really wanted to develop for crossbow to reduce the CI templates into a single command. The local docker execution or any kind of execution in virtual machines (either locally or in the cloud) are really similar to what travis/circleci/etc. provide but with different dialects. I could be also hooked with buildbot to keep the reporting capabilities of it. > > > builders like the on-demand crossbow trigger and the benchmark runner > > - no possibility to customize the buildsteps (like aggregating the count > of > > warnings) > > - no time statistics for the steps which would make it harder to optimize > > the > > build times > > - to properly clean up the container some custom solution would be > required > > - if we'd need to introduce additional parametrizations to the > > docker-compose.yaml (for example to add other architectures) then it > might > > require full yaml duplication > > I think the tool would need to be higher level than docker-compose > > In general I'm not very comfortable introducing a hard dependency on > Buildbot (or any CI platform, for that matter) into the project. So we > have to figure out a way to move forward without such hard dependency > or go back to the drawing board. > While I generally agree with your idea, note that buildbot is not a CI platform, it rather a task execution framework tailored toward CI requirements, so it is more comparable to airflow or dask - thus the cost of buildbot's vendor lock-in is lower. IMO we should move on with ursabot, at least on short term to keep the CI times faster. In longer-term we can elaborate on your proposed tool. > > > - exchanging data between the docker-compose container and builtbot > would be > > more complicated, for example the benchmark comment reporter reads > > the result from a file, in order to do the same (reading structured > > output on > > stdout and stderr from scripts is more error prone) mounted volumes are > > required, which brings the usual permission problems on linux. > > - local reproducibility still requires manual intervention because the > > scripts > > within the docker containers are not pausable, they exit and the steps > > until > > the failed one must be re-executed* after ssh-ing into the running > > container. > > > > Honestly I see more issues than advantages here. Let's see the other way > > around. > > > > Local reproducibility with ursabot? > > ----------------------------------- > > > > The most wanted feature what docker-compose has but ursabot doesn't is > the > > local reproducibility. First of all, ursabot can be run locally, > including > > all > > if its builders, so the local reproducibility is partially resolved. The > > missing piece is the interactive shell into the running container, > because > > buildbot instantly stops and aggressively clean up everything after the > > container. > > > > I have three solutions / workarounds in mind: > > > > 1. We have all the power of docker and docker-compose from ursabot > through > > docker-py, and we can easily keep the container running by simply not > > stopping it [3]. Configuring the locally running buildbot to keep the > > containers running after a failure seems quite easy. *It has the > > advantage > > that all of the buildsteps preceding one are already executed, so it > > requires less manual intervention. > > This could be done on the web UI or even from the CLI, like > > `ursabot reproduce <builder-name>` > > 2. Generate the docker-compose.yaml and required scripts from the Ursabot > > builder configurations, including the shell scripts. > > 3. Generate a set of commands to reproduce the failure without (even > asking > > the comment bot "how to reproduce the failing one"). The response > would > > look similar to: > > ```bash > > $ docker pull <image> > > $ docker run -it <image> bash > > # cmd1 > > # cmd2 > > # <- error occurs here -> > > ``` > > > > TL;DR > > ----- > > In the first iteration I'd remove the travis configurations. > > In the second iteration I'd develop a feature for ursabot to make local > > reproducibility possible. > > > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker > > [2]: https://ci.ursalabs.org/#/builders/87/builds/929 > > [3]: > > > https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343 > > > > * A local tool to run any Linux-based builds locally using Docker at > > > the command line, so that CI behavior can be exactly reproduced > > > locally > > > > > > Does that seem achievable? > > > > > Thanks, > > > Wes > > > > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs > > > <szucs.kriszt...@gmail.com> wrote: > > > > > > > > Hi All, > > > > > > > > Ursabot works pretty well so far, and the CI feedback times have > become > > > > even better* after enabling the docker volume caches, the development > > > > and maintenance of it is still not available for the whole Arrow > > > community. > > > > > > > > While it wasn't straightforward I've managed to separate to source > code > > > > required to configure the Arrow builders into a separate directory, > which > > > > eventually can be donated to Arrow. > > > > The README is under construction, but the code is available here [1]. > > > > > > > > Until this codebase is not governed by the Arrow community, > > > > decommissioning slow travis builds is not possible, so the overall CI > > > times > > > > required to merge a PR will remain high. > > > > > > > > Regards, Krisztian > > > > > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes > > > > * Python builder times have dropped from ~7-8 minutes to ~3-5 minutes > > > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12 > minutes > > > > > > > > [1]: > > > > > > > > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow > > > >