Re: Ursabot configuration within Arrow

Krisztián Szűcs Tue, 30 Jul 2019 07:45:52 -0700

Ok, but the configuration movement to arrow is orthogonal to
the local reproducibility feature. Could we proceed with that?


On Tue, Jul 30, 2019 at 4:38 PM Wes McKinney <wesmck...@gmail.com> wrote:

> I will defer to others to investigate this matter further but I would
> really like to see a concrete and practical path to local
> reproducibility before moving forward on any changes to our current
> CI.
>
> On Tue, Jul 30, 2019 at 7:38 AM Krisztián Szűcs
> <szucs.kriszt...@gmail.com> wrote:
> >
> > Fixed it and restarted a bunch of builds.
> >
> > On Tue, Jul 30, 2019 at 5:13 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > By the way, can you please disable the Buildbot builders that are
> > > causing builds on master to fail? We haven't had a passing build in
> > > over a week. Until we reconcile the build configurations we shouldn't
> > > be failing contributors' builds
> > >
> > > On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > >
> > > > On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs
> > > > <szucs.kriszt...@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com>
> > > wrote:
> > > > >
> > > > > > hi Krisztian,
> > > > > >
> > > > > > Before talking about any code donations or where to run builds, I
> > > > > > think we first need to discuss the worrisome situation where we
> have
> > > > > > in some cases 3 (or more) CI configurations for different
> components
> > > > > > in the project.
> > > > > >
> > > > > > Just taking into account out C++ build, we have:
> > > > > >
> > > > > > * A config for Travis CI
> > > > > > * Multiple configurations in Dockerfiles under cpp/
> > > > > > * A brand new (?) configuration in this third party
> ursa-labs/ursabot
> > > > > > repository
> > > > > >
> > > > > > I note for example that the "AMD64 Conda C++" Buildbot build is
> > > > > > failing while Travis CI is succeeding
> > > > > >
> > > > > > https://ci.ursalabs.org/#builders/66/builds/3196
> > > > > >
> > > > > > Starting from first principles, at least for Linux-based builds,
> what
> > > > > > I would like to see is:
> > > > > >
> > > > > > * A single build configuration (which can be driven by yaml-based
> > > > > > configuration files and environment variables), rather than 3
> like we
> > > > > > have now. This build configuration should be decoupled from any
> CI
> > > > > > platform, including Travis CI and Buildbot
> > > > > >
> > > > > Yeah, this would be the ideal setup, but I'm afraid the situation
> is a
> > > bit
> > > > > more complicated.
> > > > >
> > > > > TravisCI
> > > > > --------
> > > > >
> > > > > constructed from a bunch of scripts optimized for travis, this
> setup is
> > > > > slow
> > > > > and hardly compatible with any of the remaining setups.
> > > > > I think we should ditch it.
> > > > >
> > > > > The "docker-compose setup"
> > > > > --------------------------
> > > > >
> > > > > Most of the Dockerfiles are part of the  docker-compose setup we've
> > > > > developed.
> > > > > This might be a good candidate as the tool to centralize around our
> > > future
> > > > > setup, mostly because docker-compose is widely used, and we could
> setup
> > > > > buildbot builders (or any other CI's) to execute the sequence of
> > > > > docker-compose
> > > > > build and docker-compose run commands.
> > > > > However docker-compose is not suitable for building and running
> > > > > hierarchical
> > > > > images. This is why we have added Makefile [1] to execute a "build"
> > > with a
> > > > > single make command instead of manually executing multiple commands
> > > > > involving
> > > > > multiple images (which is error prone). It can also leave a lot of
> > > garbage
> > > > > after both containers and images.
> > > > > Docker-compose shines when one needs to orchestrate multiple
> > > containers and
> > > > > their networks / volumes on the same machine. We made it work
> (with a
> > > > > couple of
> > > > > hacky workarounds) for arrow though.
> > > > > Despite that, I still consider the docker-compose setup a good
> > > solution,
> > > > > mostly because its biggest advantage, the local reproducibility.
> > > > >
> > > >
> > > > I think what is missing here is an orchestration tool (for example, a
> > > > Python program) to invoke Docker-based development workflows
> involving
> > > > multiple steps.
> > > >
> > > > > Ursabot
> > > > > -------
> > > > >
> > > > > Ursabot uses low level docker commands to spin up and down the
> > > containers
> > > > > and
> > > > > it also has a utility to nicely build the hierarchical images (with
> > > much
> > > > > less
> > > > > maintainable code). The builders are reliable, fast (thanks to
> docker)
> > > and
> > > > > it's
> > > > > great so far.
> > > > > Where it falls short compared to docker-compose is the lack of the
> > > local
> > > > > reproducibility, currently the docker worker cleans up everything
> > > after it
> > > > > except the mounted volumes for caching. `docker-compose run` is a
> > > pretty
> > > > > nice
> > > > > way to shell into the container.
> > > > >
> > > > > Use docker-compose from ursabot?
> > > > > --------------------------------
> > > > >
> > > > > So assume that we should use docker-compose commands in the
> buildbot
> > > > > builders.
> > > > > Then:
> > > > > - there would be a single build step for all builders [2] (which
> means
> > > a
> > > > >   single chunk of unreadable log) - it also complicates working
> with
> > > > > esoteric
> > > >
> > > > I think this is too much of a black-and-white way of looking at
> > > > things. What I would like to see is a build orchestration tool, which
> > > > can be used via command line interface, not unlike the current
> > > > crossbow.py and archery command line scripts, that can invoke a build
> > > > locally or in a CI setting.
> > > >
> > > > >   builders like the on-demand crossbow trigger and the benchmark
> runner
> > > > > - no possibility to customize the buildsteps (like aggregating the
> > > count of
> > > > >   warnings)
> > > > > - no time statistics for the steps which would make it harder to
> > > optimize
> > > > > the
> > > > >   build times
> > > > > - to properly clean up the container some custom solution would be
> > > required
> > > > > - if we'd need to introduce additional parametrizations to the
> > > > >   docker-compose.yaml (for example to add other architectures)
> then it
> > > might
> > > > >   require full yaml duplication
> > > >
> > > > I think the tool would need to be higher level than docker-compose
> > > >
> > > > In general I'm not very comfortable introducing a hard dependency on
> > > > Buildbot (or any CI platform, for that matter) into the project. So
> we
> > > > have to figure out a way to move forward without such hard dependency
> > > > or go back to the drawing board.
> > > >
> > > > > - exchanging data between the docker-compose container and builtbot
> > > would be
> > > > >   more complicated, for example the benchmark comment reporter
> reads
> > > > >   the result from a file, in order to do the same (reading
> structured
> > > > > output on
> > > > >   stdout and stderr from scripts is more error prone) mounted
> volumes
> > > are
> > > > >   required, which brings the usual permission problems on linux.
> > > > > - local reproducibility still requires manual intervention because
> the
> > > > > scripts
> > > > >   within the docker containers are not pausable, they exit and the
> > > steps
> > > > > until
> > > > >   the failed one must be re-executed* after ssh-ing into the
> running
> > > > > container.
> > > > >
> > > > > Honestly I see more issues than advantages here. Let's see the
> other
> > > way
> > > > > around.
> > > > >
> > > > > Local reproducibility with ursabot?
> > > > > -----------------------------------
> > > > >
> > > > > The most wanted feature what docker-compose has but ursabot
> doesn't is
> > > the
> > > > > local reproducibility. First of all, ursabot can be run locally,
> > > including
> > > > > all
> > > > > if its builders, so the local reproducibility is partially
> resolved.
> > > The
> > > > > missing piece is the interactive shell into the running container,
> > > because
> > > > > buildbot instantly stops and aggressively clean up everything
> after the
> > > > > container.
> > > > >
> > > > > I have three solutions / workarounds in mind:
> > > > >
> > > > > 1. We have all the power of docker and docker-compose from ursabot
> > > through
> > > > >    docker-py, and we can easily keep the container running by
> simply
> > > not
> > > > >    stopping it [3]. Configuring the locally running buildbot to
> keep
> > > the
> > > > >    containers running after a failure seems quite easy. *It has the
> > > > > advantage
> > > > >    that all of the buildsteps preceding one are already executed,
> so it
> > > > >    requires less manual intervention.
> > > > >    This could be done on the web UI or even from the CLI, like
> > > > >    `ursabot reproduce <builder-name>`
> > > > > 2. Generate the docker-compose.yaml and required scripts from the
> > > Ursabot
> > > > >    builder configurations, including the shell scripts.
> > > > > 3. Generate a set of commands to reproduce the failure without
> (even
> > > asking
> > > > >    the comment bot "how to reproduce the failing one"). The
> response
> > > would
> > > > >    look similar to:
> > > > >    ```bash
> > > > >    $ docker pull <image>
> > > > >    $ docker run -it <image> bash
> > > > >    # cmd1
> > > > >    # cmd2
> > > > >    # <- error occurs here ->
> > > > >    ```
> > > > >
> > > > > TL;DR
> > > > > -----
> > > > > In the first iteration I'd remove the travis configurations.
> > > > > In the second iteration I'd develop a feature for ursabot to make
> local
> > > > > reproducibility possible.
> > > > >
> > > > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
> > > > > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
> > > > > [3]:
> > > > >
> > >
> https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343
> > > > >
> > > > > * A local tool to run any Linux-based builds locally using Docker
> at
> > > > > > the command line, so that CI behavior can be exactly reproduced
> > > > > > locally
> > > > > >
> > > > > > Does that seem achievable?
> > > > > >
> > > > > Thanks,
> > > > > > Wes
> > > > > >
> > > > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
> > > > > > <szucs.kriszt...@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > Ursabot works pretty well so far, and the CI feedback times
> have
> > > become
> > > > > > > even better* after enabling the docker volume caches, the
> > > development
> > > > > > > and maintenance of it is still not available for the whole
> Arrow
> > > > > > community.
> > > > > > >
> > > > > > > While it wasn't straightforward I've managed to separate to
> source
> > > code
> > > > > > > required to configure the Arrow builders into a separate
> > > directory, which
> > > > > > > eventually can be donated to Arrow.
> > > > > > > The README is under construction, but the code is available
> here
> > > [1].
> > > > > > >
> > > > > > > Until this codebase is not governed by the Arrow community,
> > > > > > > decommissioning slow travis builds is not possible, so the
> overall
> > > CI
> > > > > > times
> > > > > > > required to merge a PR will remain high.
> > > > > > >
> > > > > > > Regards, Krisztian
> > > > > > >
> > > > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4
> minutes
> > > > > > > * Python builder times have dropped from ~7-8 minutes to ~3-5
> > > minutes
> > > > > > > * ARM C++ builder time have dropped from ~19-20 minutes to
> ~9-12
> > > minutes
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > >
> > >
> https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
> > > > > >
> > >
>

Re: Ursabot configuration within Arrow

Reply via email to