Re: Confronting Arrow packaging problems

Antoine Pitrou Mon, 26 Mar 2018 08:27:18 -0700

Hi,

As someone who started contributing recently, I'd like to raise a few
points.  I hope this post doesn't come accross as rambling or clueless,
otherwise feel free to ignore / point it out :-)



What does a release require?
============================

I didn't find any official documentation answering that question.

Right now, the in-line CI in the arrow repository ensures we have the
following:
- the source base builds fine in *some* configurations on each of the three
  major platforms (Linux, macOS, Windows)
- the various test suites run fine on each of those platforms

Is it a requirement that binary packages can be produced reliably for
a number of platforms, and if so, which ones?  Is it a requirement that
binary packages are available from day one when a release is done, or
is that a best effort thing depending on the availability of specific
platform maintainers?  It would be useful to spell that out somewhere.


Who is responsible for producing packages?
==========================================

Right now it seems packages are all produced out of a single repository
"arrow-dist".  That repository handles production of binary artifacts:
Python wheels, Ubuntu / CentOS / Debian packages...

It's not obvious if specific people are responsible for each of the
package production chains.  It's common in open source projects to have
dedicated persons (or teams) responsible for each platform target.
This ensures that 1) the packages are produced by motivated people
who are familiar enough with their platforms of interest 2) producing
packages does not otherwise drain the stamina of the development team.


CI strategy
===========

We have two conflicting requirements:
1) Test as much as possible as part of continuous integration (including,
   possible, the production of viable binary packages)
2) Keep CI times reasonable to avoid grinding.  Some significant work
   was done recently to cut down our build times on Travis-CI
   and AppVeyor, often by half (ARROW-2071, ARROW-2083, ARROW-2231).

To give a point of comparison, CPython has a two-thronged approach:

1) in-line CI using Travis-CI and AppVeyor, with simple build matrices
   (1 build on AppVeyor, 2 required + 2 optional on Travis-CI).  In-line
   CI must validate for a PR to be merged.
2) out-of-line CI using a farm of buildbots:
   http://buildbot.python.org/all/#/grid?branch=master

Each buildbot has a maintainer, interested in keeping that specific platform
and configuration running.  Some buildbots are marked stable and strongly
recommended to be green at all times (and especially when releasing).  Some
buildbots on the other hand are marked unstable and represent less mainstream
configurations which are just "nice to fix".

The take-aways here are:
* Mainline development isn't throttled by the production of binary artifacts
  or testing on a myriad of (possible slow or busy) CI platforms.
* Each tested configuration has a maintainer willing to identify and diagnose
  problems (either propose a solution themselves or notify the developer
  responsible for a regression).
* Some things are release blockers (the "stable" platforms), some are not
  and just nice to have.

Two side notes:
* CPython is a much simpler project than Arrow, since it's C99 with minimal
  dependencies.
* I wouldn't necessarily recommend buildbot as a CI platform.


Build options
=============

It may be useful to look into reducing the number of build options, and/or
standardize on supported settings, per platform.  For example, we should
decide whether boost should be bundled or not, namespaced or not, on each
platform.  People with specific development requirements can try to override
that, but with no guarantee from us.

For example, on the llvmlite project we decided early on that we would always
link LLVM statically.  Third-party maintainers may decide to do things
differently, but they would have to maintain their own build scripts or patches.


Regards

Antoine.


Le 23/03/2018 à 17:58, Wes McKinney a écrit :
> hi folks,
> 
> So, I want to bring light to the problems we are having delivering
> binary artifacts after Arrow releases.
> 
> We have some amount of packaging automation implemented in
> https://github.com/apache/arrow-dist using Travis CI and Appveyor to
> upload packages to Bintray, a packaging hosting service.
> 
> Unfortunately, we discovered a bunch of problems with these packaging
> scripts after the release vote closed on Monday, and now 4 days later,
> we still have been unable to post binaries to
> https://pypi.python.org/pypi/pyarrow
> 
> This is no one's fault, but it highlights structural problems with our
> development process:
> 
> * Why does producing packages after a release require error-prone manual 
> labor?
> 
> * Why are we only finding out about packaging problem after a release
> vote closes?
> 
> * Why is setting up nightly binary builds a brittle and bespoke process?
> 
> I hope all agree that:
> 
> * Packaging should not be a hardship or require a lot of manual labor
> 
> * Packaging problems on the master branch should be made known within
> ~24 hours, so they can be remedied immediately
> 
> * It should be straightforward to produce binary artifacts for all
> supported platforms and programming languages
> 
> Eventually, we should include some binary artifacts in our release
> votes, but we are pretty far away from suitable automation to make
> this possible.
> 
> I don't know any easy solutions, but Apache Arrow has grown widely
> used enough that I think it's worth our taking the time to plan and
> execute some solutions to these problems, which I expect to pay
> dividends in our community's productivity over time.
> 
> Thanks,
> Wes
>

Re: Confronting Arrow packaging problems

Reply via email to