Ok, but the configuration movement to arrow is orthogonal to the local reproducibility feature. Could we proceed with that?
On Tue, Jul 30, 2019 at 4:38 PM Wes McKinney <wesmck...@gmail.com> wrote: > I will defer to others to investigate this matter further but I would > really like to see a concrete and practical path to local > reproducibility before moving forward on any changes to our current > CI. > > On Tue, Jul 30, 2019 at 7:38 AM Krisztián Szűcs > <szucs.kriszt...@gmail.com> wrote: > > > > Fixed it and restarted a bunch of builds. > > > > On Tue, Jul 30, 2019 at 5:13 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > By the way, can you please disable the Buildbot builders that are > > > causing builds on master to fail? We haven't had a passing build in > > > over a week. Until we reconcile the build configurations we shouldn't > > > be failing contributors' builds > > > > > > On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > > > > On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs > > > > <szucs.kriszt...@gmail.com> wrote: > > > > > > > > > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> > > > wrote: > > > > > > > > > > > hi Krisztian, > > > > > > > > > > > > Before talking about any code donations or where to run builds, I > > > > > > think we first need to discuss the worrisome situation where we > have > > > > > > in some cases 3 (or more) CI configurations for different > components > > > > > > in the project. > > > > > > > > > > > > Just taking into account out C++ build, we have: > > > > > > > > > > > > * A config for Travis CI > > > > > > * Multiple configurations in Dockerfiles under cpp/ > > > > > > * A brand new (?) configuration in this third party > ursa-labs/ursabot > > > > > > repository > > > > > > > > > > > > I note for example that the "AMD64 Conda C++" Buildbot build is > > > > > > failing while Travis CI is succeeding > > > > > > > > > > > > https://ci.ursalabs.org/#builders/66/builds/3196 > > > > > > > > > > > > Starting from first principles, at least for Linux-based builds, > what > > > > > > I would like to see is: > > > > > > > > > > > > * A single build configuration (which can be driven by yaml-based > > > > > > configuration files and environment variables), rather than 3 > like we > > > > > > have now. This build configuration should be decoupled from any > CI > > > > > > platform, including Travis CI and Buildbot > > > > > > > > > > > Yeah, this would be the ideal setup, but I'm afraid the situation > is a > > > bit > > > > > more complicated. > > > > > > > > > > TravisCI > > > > > -------- > > > > > > > > > > constructed from a bunch of scripts optimized for travis, this > setup is > > > > > slow > > > > > and hardly compatible with any of the remaining setups. > > > > > I think we should ditch it. > > > > > > > > > > The "docker-compose setup" > > > > > -------------------------- > > > > > > > > > > Most of the Dockerfiles are part of the docker-compose setup we've > > > > > developed. > > > > > This might be a good candidate as the tool to centralize around our > > > future > > > > > setup, mostly because docker-compose is widely used, and we could > setup > > > > > buildbot builders (or any other CI's) to execute the sequence of > > > > > docker-compose > > > > > build and docker-compose run commands. > > > > > However docker-compose is not suitable for building and running > > > > > hierarchical > > > > > images. This is why we have added Makefile [1] to execute a "build" > > > with a > > > > > single make command instead of manually executing multiple commands > > > > > involving > > > > > multiple images (which is error prone). It can also leave a lot of > > > garbage > > > > > after both containers and images. > > > > > Docker-compose shines when one needs to orchestrate multiple > > > containers and > > > > > their networks / volumes on the same machine. We made it work > (with a > > > > > couple of > > > > > hacky workarounds) for arrow though. > > > > > Despite that, I still consider the docker-compose setup a good > > > solution, > > > > > mostly because its biggest advantage, the local reproducibility. > > > > > > > > > > > > > I think what is missing here is an orchestration tool (for example, a > > > > Python program) to invoke Docker-based development workflows > involving > > > > multiple steps. > > > > > > > > > Ursabot > > > > > ------- > > > > > > > > > > Ursabot uses low level docker commands to spin up and down the > > > containers > > > > > and > > > > > it also has a utility to nicely build the hierarchical images (with > > > much > > > > > less > > > > > maintainable code). The builders are reliable, fast (thanks to > docker) > > > and > > > > > it's > > > > > great so far. > > > > > Where it falls short compared to docker-compose is the lack of the > > > local > > > > > reproducibility, currently the docker worker cleans up everything > > > after it > > > > > except the mounted volumes for caching. `docker-compose run` is a > > > pretty > > > > > nice > > > > > way to shell into the container. > > > > > > > > > > Use docker-compose from ursabot? > > > > > -------------------------------- > > > > > > > > > > So assume that we should use docker-compose commands in the > buildbot > > > > > builders. > > > > > Then: > > > > > - there would be a single build step for all builders [2] (which > means > > > a > > > > > single chunk of unreadable log) - it also complicates working > with > > > > > esoteric > > > > > > > > I think this is too much of a black-and-white way of looking at > > > > things. What I would like to see is a build orchestration tool, which > > > > can be used via command line interface, not unlike the current > > > > crossbow.py and archery command line scripts, that can invoke a build > > > > locally or in a CI setting. > > > > > > > > > builders like the on-demand crossbow trigger and the benchmark > runner > > > > > - no possibility to customize the buildsteps (like aggregating the > > > count of > > > > > warnings) > > > > > - no time statistics for the steps which would make it harder to > > > optimize > > > > > the > > > > > build times > > > > > - to properly clean up the container some custom solution would be > > > required > > > > > - if we'd need to introduce additional parametrizations to the > > > > > docker-compose.yaml (for example to add other architectures) > then it > > > might > > > > > require full yaml duplication > > > > > > > > I think the tool would need to be higher level than docker-compose > > > > > > > > In general I'm not very comfortable introducing a hard dependency on > > > > Buildbot (or any CI platform, for that matter) into the project. So > we > > > > have to figure out a way to move forward without such hard dependency > > > > or go back to the drawing board. > > > > > > > > > - exchanging data between the docker-compose container and builtbot > > > would be > > > > > more complicated, for example the benchmark comment reporter > reads > > > > > the result from a file, in order to do the same (reading > structured > > > > > output on > > > > > stdout and stderr from scripts is more error prone) mounted > volumes > > > are > > > > > required, which brings the usual permission problems on linux. > > > > > - local reproducibility still requires manual intervention because > the > > > > > scripts > > > > > within the docker containers are not pausable, they exit and the > > > steps > > > > > until > > > > > the failed one must be re-executed* after ssh-ing into the > running > > > > > container. > > > > > > > > > > Honestly I see more issues than advantages here. Let's see the > other > > > way > > > > > around. > > > > > > > > > > Local reproducibility with ursabot? > > > > > ----------------------------------- > > > > > > > > > > The most wanted feature what docker-compose has but ursabot > doesn't is > > > the > > > > > local reproducibility. First of all, ursabot can be run locally, > > > including > > > > > all > > > > > if its builders, so the local reproducibility is partially > resolved. > > > The > > > > > missing piece is the interactive shell into the running container, > > > because > > > > > buildbot instantly stops and aggressively clean up everything > after the > > > > > container. > > > > > > > > > > I have three solutions / workarounds in mind: > > > > > > > > > > 1. We have all the power of docker and docker-compose from ursabot > > > through > > > > > docker-py, and we can easily keep the container running by > simply > > > not > > > > > stopping it [3]. Configuring the locally running buildbot to > keep > > > the > > > > > containers running after a failure seems quite easy. *It has the > > > > > advantage > > > > > that all of the buildsteps preceding one are already executed, > so it > > > > > requires less manual intervention. > > > > > This could be done on the web UI or even from the CLI, like > > > > > `ursabot reproduce <builder-name>` > > > > > 2. Generate the docker-compose.yaml and required scripts from the > > > Ursabot > > > > > builder configurations, including the shell scripts. > > > > > 3. Generate a set of commands to reproduce the failure without > (even > > > asking > > > > > the comment bot "how to reproduce the failing one"). The > response > > > would > > > > > look similar to: > > > > > ```bash > > > > > $ docker pull <image> > > > > > $ docker run -it <image> bash > > > > > # cmd1 > > > > > # cmd2 > > > > > # <- error occurs here -> > > > > > ``` > > > > > > > > > > TL;DR > > > > > ----- > > > > > In the first iteration I'd remove the travis configurations. > > > > > In the second iteration I'd develop a feature for ursabot to make > local > > > > > reproducibility possible. > > > > > > > > > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker > > > > > [2]: https://ci.ursalabs.org/#/builders/87/builds/929 > > > > > [3]: > > > > > > > > > https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343 > > > > > > > > > > * A local tool to run any Linux-based builds locally using Docker > at > > > > > > the command line, so that CI behavior can be exactly reproduced > > > > > > locally > > > > > > > > > > > > Does that seem achievable? > > > > > > > > > > > Thanks, > > > > > > Wes > > > > > > > > > > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs > > > > > > <szucs.kriszt...@gmail.com> wrote: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > Ursabot works pretty well so far, and the CI feedback times > have > > > become > > > > > > > even better* after enabling the docker volume caches, the > > > development > > > > > > > and maintenance of it is still not available for the whole > Arrow > > > > > > community. > > > > > > > > > > > > > > While it wasn't straightforward I've managed to separate to > source > > > code > > > > > > > required to configure the Arrow builders into a separate > > > directory, which > > > > > > > eventually can be donated to Arrow. > > > > > > > The README is under construction, but the code is available > here > > > [1]. > > > > > > > > > > > > > > Until this codebase is not governed by the Arrow community, > > > > > > > decommissioning slow travis builds is not possible, so the > overall > > > CI > > > > > > times > > > > > > > required to merge a PR will remain high. > > > > > > > > > > > > > > Regards, Krisztian > > > > > > > > > > > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 > minutes > > > > > > > * Python builder times have dropped from ~7-8 minutes to ~3-5 > > > minutes > > > > > > > * ARM C++ builder time have dropped from ~19-20 minutes to > ~9-12 > > > minutes > > > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow > > > > > > > > > >