Re: Confronting Arrow packaging problems

Phillip Cloud Mon, 26 Mar 2018 12:49:45 -0700

On Mon, Mar 26, 2018 at 1:37 PM Wes McKinney <wesmck...@gmail.com> wrote:


> > Huge +1 on moving some of the packaging outside the scope of
> responsibility of arrow dev, specifically I don't think we should be
> responsible for anything except wheels and conda packages.
>

This is my ideal scenario, however unrealistic at the moment.


>
> In theory I agree, but until Apache Arrow grows popular enough and
> important enough for other communities to assume responsibility for
> timely packaging on standard platforms like major Linux distributions,
> we are going to have to do it, otherwise it will harm the growth of
> the community (since users will have a hard time installing the
> software on various platforms).
>
> Before changing the scope of what we are committing ourselves to do, I
> would like to see if we can develop suitable automation around the
> things we already have implemented. I don't want to write some things
> off as being "too hard" or "too much work for the Apache Arrow
> community" without giving a concerted automation effort a try.
>

Then I think we need to have acknowledgement from committers willing to
take on responsibility for packages when they fail to build during
automated packaging and it needs to be documented in the Arrow repo. I
don't think it's reasonable for all committers to be responsible for every
package type on every platform when something goes wrong. Ideally
automation will alleviate most or all of the issues here but things will
still fail and need specific owners.

For example, I am willing to own conda packaging for all platforms and pip
packaging for windows. I am not willing to own debian, yum, or pip
packaging for other platforms. When I say own, what I mean is that when the
automated package build fails and I get an email, I will respond ASAP with
either a fix or by contacting the appropriate person. Of course, this isn't
set in stone. If I'm able to help in other areas and I have the time, then
I will.


> Note that we have other things we should be automating, but we are not:
>
> * Nightly performance benchmarking (e.g. ASV for Python)
>
* Nightly integration tests (Spark, HDFS, Dask, API docs, etc.)
>
* Running GPU-enabled tests in CI
> * Building GPU-enabled binaries
>
> > +1. Trying to satsify everyone's downstream needs is an impossible task.
>
> I think it's really easy to say "let's remove options" and "let's make
> the build system simpler" without assessing the consequences of this
> to community/project growth. The reason there are a lot of options and
> the build system has complexity is that we are trying to satisfy a
> pretty large matrix of requirements developed organically over the
> last 2+ years.
>

It's not clear to me what is not in scope. Where do we draw the line for
testing projects downstream of arrow? How do we decide whether to test
those projects or not? Do we test them on all platforms that they support?

IMO testing dependents of Arrow should be the responsibility of those
particular pieces of software, not the responsibility of the Arrow project.
Testing Spark and Dask, for example, both seem like the responsibility of
their respective projects. Does asking these projects to do this hurt
community/project growth in some way?

Are there other large, successful projects that take on significant testing
of their dependents? If there are, we should look at how they have
addressed this issue.


>
> The central problem we are having is that our continuous integration
> and continuous delivery has not scaled to cover the diversity of use
> cases that we have accumulated. If we punt on addressing our
> automation problems and instead start removing build or packaging
> functionality to make things simpler, eventually the project will grow
> until we are dealing with a different kind of development workflow
> crisis.
>

I don't want to punt on automation, we need to do that regardless.

What do you think about having specific owners of packaging areas,
documented in the repo?


>
> On Mon, Mar 26, 2018 at 11:58 AM, Phillip Cloud <cpcl...@gmail.com> wrote:
> > Responses inline. This kind of information is extremely helpful and
> > informative.
> >
> > On Mon, Mar 26, 2018 at 11:26 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> >>
> >> Hi,
> >>
> >> As someone who started contributing recently, I'd like to raise a few
> >> points.  I hope this post doesn't come accross as rambling or clueless,
> >> otherwise feel free to ignore / point it out :-)
> >>
> >>
> >> What does a release require?
> >> ============================
> >>
> >> I didn't find any official documentation answering that question.
> >>
> >
> https://github.com/apache/arrow/blob/master/dev/release/RELEASE_MANAGEMENT.md
> > is the one I used for the most recent release.
> >
> >>
> >> Right now, the in-line CI in the arrow repository ensures we have the
> >> following:
> >> - the source base builds fine in *some* configurations on each of the
> three
> >>   major platforms (Linux, macOS, Windows)
> >> - the various test suites run fine on each of those platforms
> >>
> >> Is it a requirement that binary packages can be produced reliably for
> >> a number of platforms, and if so, which ones?  Is it a requirement that
> >> binary packages are available from day one when a release is done, or
> >> is that a best effort thing depending on the availability of specific
> >> platform maintainers?  It would be useful to spell that out somewhere.
> >>
> >
> > The release management doc doesn't spell out the specfic nitty gritty of
> > what exactly the artifacts do/don't should/shouldn't contain, though it
> > does contain some information about how to produce some of the artifacts.
> > It's critical that we spell this out somewhere.
> >
> >
> >>
> >>
> >> Who is responsible for producing packages?
> >> ==========================================
> >>
> >> Right now it seems packages are all produced out of a single repository
> >> "arrow-dist".  That repository handles production of binary artifacts:
> >> Python wheels, Ubuntu / CentOS / Debian packages...
> >>
> >> It's not obvious if specific people are responsible for each of the
> >> package production chains.  It's common in open source projects to have
> >> dedicated persons (or teams) responsible for each platform target.
> >> This ensures that 1) the packages are produced by motivated people
> >> who are familiar enough with their platforms of interest 2) producing
> >> packages does not otherwise drain the stamina of the development team.
> >>
> >
> > Huge +1 on moving some of the packaging outside the scope of
> responsibility
> > of arrow dev, specifically I don't think we should be responsible for
> > anything except wheels and conda packages.
> >
> > One question I have here is: are the separate package type
> scripts/software
> > maintained in different repositories?
> >
> > Also +1 on having a person reponsible for each platform. I wonder if
> having
> > a person responsible for a specific kind of artifact might spread the
> > workload more evenly since there's likely a shortage of Windows
> expertise.
> >
> >
> >>
> >> CI strategy
> >> ===========
> >>
> >> We have two conflicting requirements:
> >> 1) Test as much as possible as part of continuous integration
> (including,
> >>    possible, the production of viable binary packages)
> >> 2) Keep CI times reasonable to avoid grinding.  Some significant work
> >>    was done recently to cut down our build times on Travis-CI
> >>    and AppVeyor, often by half (ARROW-2071, ARROW-2083, ARROW-2231).
> >>
> >> To give a point of comparison, CPython has a two-thronged approach:
> >>
> >> 1) in-line CI using Travis-CI and AppVeyor, with simple build matrices
> >>    (1 build on AppVeyor, 2 required + 2 optional on Travis-CI).  In-line
> >>    CI must validate for a PR to be merged.
> >> 2) out-of-line CI using a farm of buildbots:
> >>    http://buildbot.python.org/all/#/grid?branch=master
> >
> >
> > Buildbot looks *a lot* better than the last time I looked at it :)
> >
> >
> >>
> >>
> >> Each buildbot has a maintainer, interested in keeping that specific
> >> platform
> >> and configuration running.  Some buildbots are marked stable and
> strongly
> >> recommended to be green at all times (and especially when releasing).
> Some
> >> buildbots on the other hand are marked unstable and represent less
> >> mainstream
> >> configurations which are just "nice to fix".
> >>
> >> The take-aways here are:
> >> * Mainline development isn't throttled by the production of binary
> >> artifacts
> >>   or testing on a myriad of (possible slow or busy) CI platforms.
> >> * Each tested configuration has a maintainer willing to identify and
> >> diagnose
> >>   problems (either propose a solution themselves or notify the developer
> >>   responsible for a regression).
> >> * Some things are release blockers (the "stable" platforms), some are
> not
> >>   and just nice to have.
> >>
> >
> > IMO I would like the "stable" platforms should be conda packages for
> > arrow/pyarrow and pip wheels. We should discuss that more.
> >
> >
> >>
> >> Two side notes:
> >> * CPython is a much simpler project than Arrow, since it's C99 with
> minimal
> >>   dependencies.
> >> * I wouldn't necessarily recommend buildbot as a CI platform.
> >>
> >>
> >> Build options
> >> =============
> >>
> >> It may be useful to look into reducing the number of build options,
> and/or
> >> standardize on supported settings, per platform.  For example, we should
> >> decide whether boost should be bundled or not, namespaced or not, on
> each
> >> platform.  People with specific development requirements can try to
> >> override
> >> that, but with no guarantee from us.
> >>
> >
> > +1. Trying to satsify everyone's downstream needs is an impossible task.
> >
> >
> >>
> >> For example, on the llvmlite project we decided early on that we would
> >> always
> >> link LLVM statically.  Third-party maintainers may decide to do things
> >> differently, but they would have to maintain their own build scripts or
> >> patches.
> >>
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 23/03/2018 à 17:58, Wes McKinney a écrit :
> >> > hi folks,
> >> >
> >> > So, I want to bring light to the problems we are having delivering
> >> > binary artifacts after Arrow releases.
> >> >
> >> > We have some amount of packaging automation implemented in
> >> > https://github.com/apache/arrow-dist using Travis CI and Appveyor to
> >> > upload packages to Bintray, a packaging hosting service.
> >> >
> >> > Unfortunately, we discovered a bunch of problems with these packaging
> >> > scripts after the release vote closed on Monday, and now 4 days later,
> >> > we still have been unable to post binaries to
> >> > https://pypi.python.org/pypi/pyarrow
> >> >
> >> > This is no one's fault, but it highlights structural problems with our
> >> > development process:
> >> >
> >> > * Why does producing packages after a release require error-prone
> manual
> >> labor?
> >> >
> >> > * Why are we only finding out about packaging problem after a release
> >> > vote closes?
> >> >
> >> > * Why is setting up nightly binary builds a brittle and bespoke
> process?
> >> >
> >> > I hope all agree that:
> >> >
> >> > * Packaging should not be a hardship or require a lot of manual labor
> >> >
> >> > * Packaging problems on the master branch should be made known within
> >> > ~24 hours, so they can be remedied immediately
> >> >
> >> > * It should be straightforward to produce binary artifacts for all
> >> > supported platforms and programming languages
> >> >
> >> > Eventually, we should include some binary artifacts in our release
> >> > votes, but we are pretty far away from suitable automation to make
> >> > this possible.
> >> >
> >> > I don't know any easy solutions, but Apache Arrow has grown widely
> >> > used enough that I think it's worth our taking the time to plan and
> >> > execute some solutions to these problems, which I expect to pay
> >> > dividends in our community's productivity over time.
> >> >
> >> > Thanks,
> >> > Wes
> >> >
> >>
>

Re: Confronting Arrow packaging problems

Reply via email to