By the way, can you please disable the Buildbot builders that are
causing builds on master to fail? We haven't had a passing build in
over a week. Until we reconcile the build configurations we shouldn't
be failing contributors' builds

On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs
> <szucs.kriszt...@gmail.com> wrote:
> >
> > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > > hi Krisztian,
> > >
> > > Before talking about any code donations or where to run builds, I
> > > think we first need to discuss the worrisome situation where we have
> > > in some cases 3 (or more) CI configurations for different components
> > > in the project.
> > >
> > > Just taking into account out C++ build, we have:
> > >
> > > * A config for Travis CI
> > > * Multiple configurations in Dockerfiles under cpp/
> > > * A brand new (?) configuration in this third party ursa-labs/ursabot
> > > repository
> > >
> > > I note for example that the "AMD64 Conda C++" Buildbot build is
> > > failing while Travis CI is succeeding
> > >
> > > https://ci.ursalabs.org/#builders/66/builds/3196
> > >
> > > Starting from first principles, at least for Linux-based builds, what
> > > I would like to see is:
> > >
> > > * A single build configuration (which can be driven by yaml-based
> > > configuration files and environment variables), rather than 3 like we
> > > have now. This build configuration should be decoupled from any CI
> > > platform, including Travis CI and Buildbot
> > >
> > Yeah, this would be the ideal setup, but I'm afraid the situation is a bit
> > more complicated.
> >
> > TravisCI
> > --------
> >
> > constructed from a bunch of scripts optimized for travis, this setup is
> > slow
> > and hardly compatible with any of the remaining setups.
> > I think we should ditch it.
> >
> > The "docker-compose setup"
> > --------------------------
> >
> > Most of the Dockerfiles are part of the  docker-compose setup we've
> > developed.
> > This might be a good candidate as the tool to centralize around our future
> > setup, mostly because docker-compose is widely used, and we could setup
> > buildbot builders (or any other CI's) to execute the sequence of
> > docker-compose
> > build and docker-compose run commands.
> > However docker-compose is not suitable for building and running
> > hierarchical
> > images. This is why we have added Makefile [1] to execute a "build" with a
> > single make command instead of manually executing multiple commands
> > involving
> > multiple images (which is error prone). It can also leave a lot of garbage
> > after both containers and images.
> > Docker-compose shines when one needs to orchestrate multiple containers and
> > their networks / volumes on the same machine. We made it work (with a
> > couple of
> > hacky workarounds) for arrow though.
> > Despite that, I still consider the docker-compose setup a good solution,
> > mostly because its biggest advantage, the local reproducibility.
> >
>
> I think what is missing here is an orchestration tool (for example, a
> Python program) to invoke Docker-based development workflows involving
> multiple steps.
>
> > Ursabot
> > -------
> >
> > Ursabot uses low level docker commands to spin up and down the containers
> > and
> > it also has a utility to nicely build the hierarchical images (with much
> > less
> > maintainable code). The builders are reliable, fast (thanks to docker) and
> > it's
> > great so far.
> > Where it falls short compared to docker-compose is the lack of the local
> > reproducibility, currently the docker worker cleans up everything after it
> > except the mounted volumes for caching. `docker-compose run` is a pretty
> > nice
> > way to shell into the container.
> >
> > Use docker-compose from ursabot?
> > --------------------------------
> >
> > So assume that we should use docker-compose commands in the buildbot
> > builders.
> > Then:
> > - there would be a single build step for all builders [2] (which means a
> >   single chunk of unreadable log) - it also complicates working with
> > esoteric
>
> I think this is too much of a black-and-white way of looking at
> things. What I would like to see is a build orchestration tool, which
> can be used via command line interface, not unlike the current
> crossbow.py and archery command line scripts, that can invoke a build
> locally or in a CI setting.
>
> >   builders like the on-demand crossbow trigger and the benchmark runner
> > - no possibility to customize the buildsteps (like aggregating the count of
> >   warnings)
> > - no time statistics for the steps which would make it harder to optimize
> > the
> >   build times
> > - to properly clean up the container some custom solution would be required
> > - if we'd need to introduce additional parametrizations to the
> >   docker-compose.yaml (for example to add other architectures) then it might
> >   require full yaml duplication
>
> I think the tool would need to be higher level than docker-compose
>
> In general I'm not very comfortable introducing a hard dependency on
> Buildbot (or any CI platform, for that matter) into the project. So we
> have to figure out a way to move forward without such hard dependency
> or go back to the drawing board.
>
> > - exchanging data between the docker-compose container and builtbot would be
> >   more complicated, for example the benchmark comment reporter reads
> >   the result from a file, in order to do the same (reading structured
> > output on
> >   stdout and stderr from scripts is more error prone) mounted volumes are
> >   required, which brings the usual permission problems on linux.
> > - local reproducibility still requires manual intervention because the
> > scripts
> >   within the docker containers are not pausable, they exit and the steps
> > until
> >   the failed one must be re-executed* after ssh-ing into the running
> > container.
> >
> > Honestly I see more issues than advantages here. Let's see the other way
> > around.
> >
> > Local reproducibility with ursabot?
> > -----------------------------------
> >
> > The most wanted feature what docker-compose has but ursabot doesn't is the
> > local reproducibility. First of all, ursabot can be run locally, including
> > all
> > if its builders, so the local reproducibility is partially resolved. The
> > missing piece is the interactive shell into the running container, because
> > buildbot instantly stops and aggressively clean up everything after the
> > container.
> >
> > I have three solutions / workarounds in mind:
> >
> > 1. We have all the power of docker and docker-compose from ursabot through
> >    docker-py, and we can easily keep the container running by simply not
> >    stopping it [3]. Configuring the locally running buildbot to keep the
> >    containers running after a failure seems quite easy. *It has the
> > advantage
> >    that all of the buildsteps preceding one are already executed, so it
> >    requires less manual intervention.
> >    This could be done on the web UI or even from the CLI, like
> >    `ursabot reproduce <builder-name>`
> > 2. Generate the docker-compose.yaml and required scripts from the Ursabot
> >    builder configurations, including the shell scripts.
> > 3. Generate a set of commands to reproduce the failure without (even asking
> >    the comment bot "how to reproduce the failing one"). The response would
> >    look similar to:
> >    ```bash
> >    $ docker pull <image>
> >    $ docker run -it <image> bash
> >    # cmd1
> >    # cmd2
> >    # <- error occurs here ->
> >    ```
> >
> > TL;DR
> > -----
> > In the first iteration I'd remove the travis configurations.
> > In the second iteration I'd develop a feature for ursabot to make local
> > reproducibility possible.
> >
> > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
> > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
> > [3]:
> > https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343
> >
> > * A local tool to run any Linux-based builds locally using Docker at
> > > the command line, so that CI behavior can be exactly reproduced
> > > locally
> > >
> > > Does that seem achievable?
> > >
> > Thanks,
> > > Wes
> > >
> > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
> > > <szucs.kriszt...@gmail.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Ursabot works pretty well so far, and the CI feedback times have become
> > > > even better* after enabling the docker volume caches, the development
> > > > and maintenance of it is still not available for the whole Arrow
> > > community.
> > > >
> > > > While it wasn't straightforward I've managed to separate to source code
> > > > required to configure the Arrow builders into a separate directory, 
> > > > which
> > > > eventually can be donated to Arrow.
> > > > The README is under construction, but the code is available here [1].
> > > >
> > > > Until this codebase is not governed by the Arrow community,
> > > > decommissioning slow travis builds is not possible, so the overall CI
> > > times
> > > > required to merge a PR will remain high.
> > > >
> > > > Regards, Krisztian
> > > >
> > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes
> > > > * Python builder times have dropped from ~7-8 minutes to ~3-5 minutes
> > > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12 minutes
> > > >
> > > > [1]:
> > > >
> > > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
> > >

Reply via email to