On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs
<szucs.kriszt...@gmail.com> wrote:
>
> On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Krisztian,
> >
> > Before talking about any code donations or where to run builds, I
> > think we first need to discuss the worrisome situation where we have
> > in some cases 3 (or more) CI configurations for different components
> > in the project.
> >
> > Just taking into account out C++ build, we have:
> >
> > * A config for Travis CI
> > * Multiple configurations in Dockerfiles under cpp/
> > * A brand new (?) configuration in this third party ursa-labs/ursabot
> > repository
> >
> > I note for example that the "AMD64 Conda C++" Buildbot build is
> > failing while Travis CI is succeeding
> >
> > https://ci.ursalabs.org/#builders/66/builds/3196
> >
> > Starting from first principles, at least for Linux-based builds, what
> > I would like to see is:
> >
> > * A single build configuration (which can be driven by yaml-based
> > configuration files and environment variables), rather than 3 like we
> > have now. This build configuration should be decoupled from any CI
> > platform, including Travis CI and Buildbot
> >
> Yeah, this would be the ideal setup, but I'm afraid the situation is a bit
> more complicated.
>
> TravisCI
> --------
>
> constructed from a bunch of scripts optimized for travis, this setup is
> slow
> and hardly compatible with any of the remaining setups.
> I think we should ditch it.
>
> The "docker-compose setup"
> --------------------------
>
> Most of the Dockerfiles are part of the  docker-compose setup we've
> developed.
> This might be a good candidate as the tool to centralize around our future
> setup, mostly because docker-compose is widely used, and we could setup
> buildbot builders (or any other CI's) to execute the sequence of
> docker-compose
> build and docker-compose run commands.
> However docker-compose is not suitable for building and running
> hierarchical
> images. This is why we have added Makefile [1] to execute a "build" with a
> single make command instead of manually executing multiple commands
> involving
> multiple images (which is error prone). It can also leave a lot of garbage
> after both containers and images.
> Docker-compose shines when one needs to orchestrate multiple containers and
> their networks / volumes on the same machine. We made it work (with a
> couple of
> hacky workarounds) for arrow though.
> Despite that, I still consider the docker-compose setup a good solution,
> mostly because its biggest advantage, the local reproducibility.
>

I think what is missing here is an orchestration tool (for example, a
Python program) to invoke Docker-based development workflows involving
multiple steps.

> Ursabot
> -------
>
> Ursabot uses low level docker commands to spin up and down the containers
> and
> it also has a utility to nicely build the hierarchical images (with much
> less
> maintainable code). The builders are reliable, fast (thanks to docker) and
> it's
> great so far.
> Where it falls short compared to docker-compose is the lack of the local
> reproducibility, currently the docker worker cleans up everything after it
> except the mounted volumes for caching. `docker-compose run` is a pretty
> nice
> way to shell into the container.
>
> Use docker-compose from ursabot?
> --------------------------------
>
> So assume that we should use docker-compose commands in the buildbot
> builders.
> Then:
> - there would be a single build step for all builders [2] (which means a
>   single chunk of unreadable log) - it also complicates working with
> esoteric

I think this is too much of a black-and-white way of looking at
things. What I would like to see is a build orchestration tool, which
can be used via command line interface, not unlike the current
crossbow.py and archery command line scripts, that can invoke a build
locally or in a CI setting.

>   builders like the on-demand crossbow trigger and the benchmark runner
> - no possibility to customize the buildsteps (like aggregating the count of
>   warnings)
> - no time statistics for the steps which would make it harder to optimize
> the
>   build times
> - to properly clean up the container some custom solution would be required
> - if we'd need to introduce additional parametrizations to the
>   docker-compose.yaml (for example to add other architectures) then it might
>   require full yaml duplication

I think the tool would need to be higher level than docker-compose

In general I'm not very comfortable introducing a hard dependency on
Buildbot (or any CI platform, for that matter) into the project. So we
have to figure out a way to move forward without such hard dependency
or go back to the drawing board.

> - exchanging data between the docker-compose container and builtbot would be
>   more complicated, for example the benchmark comment reporter reads
>   the result from a file, in order to do the same (reading structured
> output on
>   stdout and stderr from scripts is more error prone) mounted volumes are
>   required, which brings the usual permission problems on linux.
> - local reproducibility still requires manual intervention because the
> scripts
>   within the docker containers are not pausable, they exit and the steps
> until
>   the failed one must be re-executed* after ssh-ing into the running
> container.
>
> Honestly I see more issues than advantages here. Let's see the other way
> around.
>
> Local reproducibility with ursabot?
> -----------------------------------
>
> The most wanted feature what docker-compose has but ursabot doesn't is the
> local reproducibility. First of all, ursabot can be run locally, including
> all
> if its builders, so the local reproducibility is partially resolved. The
> missing piece is the interactive shell into the running container, because
> buildbot instantly stops and aggressively clean up everything after the
> container.
>
> I have three solutions / workarounds in mind:
>
> 1. We have all the power of docker and docker-compose from ursabot through
>    docker-py, and we can easily keep the container running by simply not
>    stopping it [3]. Configuring the locally running buildbot to keep the
>    containers running after a failure seems quite easy. *It has the
> advantage
>    that all of the buildsteps preceding one are already executed, so it
>    requires less manual intervention.
>    This could be done on the web UI or even from the CLI, like
>    `ursabot reproduce <builder-name>`
> 2. Generate the docker-compose.yaml and required scripts from the Ursabot
>    builder configurations, including the shell scripts.
> 3. Generate a set of commands to reproduce the failure without (even asking
>    the comment bot "how to reproduce the failing one"). The response would
>    look similar to:
>    ```bash
>    $ docker pull <image>
>    $ docker run -it <image> bash
>    # cmd1
>    # cmd2
>    # <- error occurs here ->
>    ```
>
> TL;DR
> -----
> In the first iteration I'd remove the travis configurations.
> In the second iteration I'd develop a feature for ursabot to make local
> reproducibility possible.
>
> [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
> [2]: https://ci.ursalabs.org/#/builders/87/builds/929
> [3]:
> https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343
>
> * A local tool to run any Linux-based builds locally using Docker at
> > the command line, so that CI behavior can be exactly reproduced
> > locally
> >
> > Does that seem achievable?
> >
> Thanks,
> > Wes
> >
> > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
> > <szucs.kriszt...@gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > Ursabot works pretty well so far, and the CI feedback times have become
> > > even better* after enabling the docker volume caches, the development
> > > and maintenance of it is still not available for the whole Arrow
> > community.
> > >
> > > While it wasn't straightforward I've managed to separate to source code
> > > required to configure the Arrow builders into a separate directory, which
> > > eventually can be donated to Arrow.
> > > The README is under construction, but the code is available here [1].
> > >
> > > Until this codebase is not governed by the Arrow community,
> > > decommissioning slow travis builds is not possible, so the overall CI
> > times
> > > required to merge a PR will remain high.
> > >
> > > Regards, Krisztian
> > >
> > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes
> > > * Python builder times have dropped from ~7-8 minutes to ~3-5 minutes
> > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12 minutes
> > >
> > > [1]:
> > >
> > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
> >

Reply via email to