Re: Confronting Arrow packaging problems

Uwe L. Korn Sun, 25 Mar 2018 03:48:45 -0700

Hello all,

I strongly support Wes' points about having better automated feedback in our 
build chain is essential, independent from the tool we use to make the builds. 
As it sadly seems that our newly uploaded Arrow wheels for OSX are broken, I'm 
going to start there and add them to the build matrix: 
https://issues.apache.org/jira/browse/ARROW-2352. Looking at the build tools, I 
think that we could get a great benefit from using Bazel or something similar, 
it probably does not solve the problems we currently face in its full scope. 
Especially the wheel build are very brittle as they need other packages also 
build in a special way so that they are redistributable.


Uwe

On Sun, Mar 25, 2018, at 2:13 AM, Krisztián Szűcs wrote:
> Just want to mention two other build systems:
> 
> Pants: https://github.com/pantsbuild/pants 
> (https://link.getmailspring.com/link/1521936116.local-4d31e945-727a-v1.1.5-5834c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fpantsbuild%2Fpants&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
> Meson: https://github.com/mesonbuild/meson 
> (https://link.getmailspring.com/link/1521936116.local-4d31e945-727a-v1.1.5-5834c...@getmailspring.com/1?redirect=https%3A%2F%2Fgithub.com%2Fmesonbuild%2Fmeson&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
> Comparison http://mesonbuild.com/Comparisons.html 
> (https://link.getmailspring.com/link/1521936116.local-4d31e945-727a-v1.1.5-5834c...@getmailspring.com/2?redirect=http%3A%2F%2Fmesonbuild.com%2FComparisons.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
> Krisztian
> On Mar 24 2018, at 9:13 pm, Phillip Cloud <cpcl...@gmail.com> wrote:
> >
> > I think we need to use a tool that can perform every single step of the
> > deployment process, end-to-end. Right now, cmake isn't cutting it IMO
> > because it lends itself quite heavily to copy pasting and oodles of bash
> > scripts that are indecipherable by anyone except the original author.
> >
> > With that in mind, here's what I think the requirements are for the next
> > generation of arrow's build and deployment system for native code (C, C++,
> > Python):
> >
> > For every platform (Linux, OS X, and Windows):
> > 1. Build sources (these langauges are currently the most cumbersome at the
> > moment)
> > 2. Run all tests for each language (ideally integration tests as well, but
> > I'm not sure if that should be a hard requirement at the moment)
> > 3. Build the API documentation for each platform
> > 4. Build installable packages that we support:
> > * conda
> > * pip wheels
> > * deb
> > * rpm
> >
> > Additional requirements:
> > 1. Be able to run all of these steps on any platform with minimal
> > environment setup.
> > 2. The output of the doc build and package build steps **should be in a
> > release-ready state at all times**. If they are not then we should fail the
> > build
> > 3. Ideally, these run on every PR so that we can find out if a commit would
> > introduce a change that would break the release-ready status of arrow.
> >
> > These requirements indicate to me that a single, extensible, cross-platform
> > tool--as opposed to many tools that are tied together by a shell script--is
> > what we need.
> >
> > There are a few tools in this space that I'm aware of:
> > 1. Bazel (out of Google)
> > 2. Buck (out of Facebook)
> >
> > I'm not sure what others are out there, but I'm sure there must be some.
> > I don't really have a strong opinion on either Bazel or Buck, but I suspect
> > that since we follow Google's conventions in a few places integrating Bazel
> > into the arrow codebase would be less work.
> >
> > The main risk I see here is that it's possible that bazel isn't the right
> > tool. I'm not sure how to mitigate this risk other than to make sure that
> > our requirements can be met by it by scouring the Bazel docs.
> >
> > I do think that the fact that Bazel is extensible mitigates some risk here.
> > For example, we'd likely have to add rule for building conda packages and
> > pip wheels.
> >
> > I guess CMake is extensible too, but I don't think I've ever seen the
> > extensibility features of CMake as anything but a burden. Bazel's extension
> > language is a subset of Python and I would therefore expect it to be a lot
> > easier to use.
> >
> > I'm interested to hear others' experiences and opinions on similar
> > problems. Also, if I've missed anything in the requirements list, please
> > don't hestitate to respond.
> >
> > Let's fix our packaging!
> > -Phillip
> > On Fri, Mar 23, 2018 at 11:21 PM Holden Karau <hol...@pigscanfly.ca> wrote:
> > > I know in Spark we’ve benefited by having some of the different language
> > > devs act as RMs and each time that language dev has ended up improving a
> > > bunch of how their components packaging has been done. Not to suggest we
> > > should just do what other projects do, but maybe an idea to consider?
> > >
> > > On Fri, Mar 23, 2018 at 12:59 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > > > hi folks,
> > > > So, I want to bring light to the problems we are having delivering
> > > > binary artifacts after Arrow releases.
> > > >
> > > > We have some amount of packaging automation implemented in
> > > > https://github.com/apache/arrow-dist using Travis CI and Appveyor to
> > > > upload packages to Bintray, a packaging hosting service.
> > > >
> > > > Unfortunately, we discovered a bunch of problems with these packaging
> > > > scripts after the release vote closed on Monday, and now 4 days later,
> > > > we still have been unable to post binaries to
> > > > https://pypi.python.org/pypi/pyarrow
> > > >
> > > > This is no one's fault, but it highlights structural problems with our
> > > > development process:
> > > >
> > > > * Why does producing packages after a release require error-prone manual
> > > > labor?
> > > >
> > > > * Why are we only finding out about packaging problem after a release
> > > > vote closes?
> > > >
> > > > * Why is setting up nightly binary builds a brittle and bespoke process?
> > > > I hope all agree that:
> > > > * Packaging should not be a hardship or require a lot of manual labor
> > > > * Packaging problems on the master branch should be made known within
> > > > ~24 hours, so they can be remedied immediately
> > > >
> > > > * It should be straightforward to produce binary artifacts for all
> > > > supported platforms and programming languages
> > > >
> > > > Eventually, we should include some binary artifacts in our release
> > > > votes, but we are pretty far away from suitable automation to make
> > > > this possible.
> > > >
> > > > I don't know any easy solutions, but Apache Arrow has grown widely
> > > > used enough that I think it's worth our taking the time to plan and
> > > > execute some solutions to these problems, which I expect to pay
> > > > dividends in our community's productivity over time.
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > --
> > > Twitter: https://twitter.com/holdenkarau
> >
> >
>

Re: Confronting Arrow packaging problems

Reply via email to