Re: Confronting Arrow packaging problems

Krisztián Szűcs Sat, 24 Mar 2018 17:14:50 -0700
Just want to mention two other build systems:

Pants: https://github.com/pantsbuild/pants 
(https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fpantsbuild%2Fpants&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
Meson: https://github.com/mesonbuild/meson 
(https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgithub.com%2Fmesonbuild%2Fmeson&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
Comparison http://mesonbuild.com/Comparisons.html 
(https://link.getmailspring.com/link/[email protected]/2?redirect=http%3A%2F%2Fmesonbuild.com%2FComparisons.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
Krisztian
On Mar 24 2018, at 9:13 pm, Phillip Cloud <[email protected]> wrote:
>
> I think we need to use a tool that can perform every single step of the
> deployment process, end-to-end. Right now, cmake isn't cutting it IMO
> because it lends itself quite heavily to copy pasting and oodles of bash
> scripts that are indecipherable by anyone except the original author.
>
> With that in mind, here's what I think the requirements are for the next
> generation of arrow's build and deployment system for native code (C, C++,
> Python):
>
> For every platform (Linux, OS X, and Windows):
> 1. Build sources (these langauges are currently the most cumbersome at the
> moment)
> 2. Run all tests for each language (ideally integration tests as well, but
> I'm not sure if that should be a hard requirement at the moment)
> 3. Build the API documentation for each platform
> 4. Build installable packages that we support:
> * conda
> * pip wheels
> * deb
> * rpm
>
> Additional requirements:
> 1. Be able to run all of these steps on any platform with minimal
> environment setup.
> 2. The output of the doc build and package build steps **should be in a
> release-ready state at all times**. If they are not then we should fail the
> build
> 3. Ideally, these run on every PR so that we can find out if a commit would
> introduce a change that would break the release-ready status of arrow.
>
> These requirements indicate to me that a single, extensible, cross-platform
> tool--as opposed to many tools that are tied together by a shell script--is
> what we need.
>
> There are a few tools in this space that I'm aware of:
> 1. Bazel (out of Google)
> 2. Buck (out of Facebook)
>
> I'm not sure what others are out there, but I'm sure there must be some.
> I don't really have a strong opinion on either Bazel or Buck, but I suspect
> that since we follow Google's conventions in a few places integrating Bazel
> into the arrow codebase would be less work.
>
> The main risk I see here is that it's possible that bazel isn't the right
> tool. I'm not sure how to mitigate this risk other than to make sure that
> our requirements can be met by it by scouring the Bazel docs.
>
> I do think that the fact that Bazel is extensible mitigates some risk here.
> For example, we'd likely have to add rule for building conda packages and
> pip wheels.
>
> I guess CMake is extensible too, but I don't think I've ever seen the
> extensibility features of CMake as anything but a burden. Bazel's extension
> language is a subset of Python and I would therefore expect it to be a lot
> easier to use.
>
> I'm interested to hear others' experiences and opinions on similar
> problems. Also, if I've missed anything in the requirements list, please
> don't hestitate to respond.
>
> Let's fix our packaging!
> -Phillip
> On Fri, Mar 23, 2018 at 11:21 PM Holden Karau <[email protected]> wrote:
> > I know in Spark we’ve benefited by having some of the different language
> > devs act as RMs and each time that language dev has ended up improving a
> > bunch of how their components packaging has been done. Not to suggest we
> > should just do what other projects do, but maybe an idea to consider?
> >
> > On Fri, Mar 23, 2018 at 12:59 PM Wes McKinney <[email protected]> wrote:
> > > hi folks,
> > > So, I want to bring light to the problems we are having delivering
> > > binary artifacts after Arrow releases.
> > >
> > > We have some amount of packaging automation implemented in
> > > https://github.com/apache/arrow-dist using Travis CI and Appveyor to
> > > upload packages to Bintray, a packaging hosting service.
> > >
> > > Unfortunately, we discovered a bunch of problems with these packaging
> > > scripts after the release vote closed on Monday, and now 4 days later,
> > > we still have been unable to post binaries to
> > > https://pypi.python.org/pypi/pyarrow
> > >
> > > This is no one's fault, but it highlights structural problems with our
> > > development process:
> > >
> > > * Why does producing packages after a release require error-prone manual
> > > labor?
> > >
> > > * Why are we only finding out about packaging problem after a release
> > > vote closes?
> > >
> > > * Why is setting up nightly binary builds a brittle and bespoke process?
> > > I hope all agree that:
> > > * Packaging should not be a hardship or require a lot of manual labor
> > > * Packaging problems on the master branch should be made known within
> > > ~24 hours, so they can be remedied immediately
> > >
> > > * It should be straightforward to produce binary artifacts for all
> > > supported platforms and programming languages
> > >
> > > Eventually, we should include some binary artifacts in our release
> > > votes, but we are pretty far away from suitable automation to make
> > > this possible.
> > >
> > > I don't know any easy solutions, but Apache Arrow has grown widely
> > > used enough that I think it's worth our taking the time to plan and
> > > execute some solutions to these problems, which I expect to pay
> > > dividends in our community's productivity over time.
> > >
> > > Thanks,
> > > Wes
> > >
> > --
> > Twitter: https://twitter.com/holdenkarau
>
>
Re: Confronting Arrow packaging problems

Reply via email to