Responses inline. This kind of information is extremely helpful and informative.
On Mon, Mar 26, 2018 at 11:26 AM Antoine Pitrou <anto...@python.org> wrote: > > Hi, > > As someone who started contributing recently, I'd like to raise a few > points. I hope this post doesn't come accross as rambling or clueless, > otherwise feel free to ignore / point it out :-) > > > What does a release require? > ============================ > > I didn't find any official documentation answering that question. > https://github.com/apache/arrow/blob/master/dev/release/RELEASE_MANAGEMENT.md is the one I used for the most recent release. > > Right now, the in-line CI in the arrow repository ensures we have the > following: > - the source base builds fine in *some* configurations on each of the three > major platforms (Linux, macOS, Windows) > - the various test suites run fine on each of those platforms > > Is it a requirement that binary packages can be produced reliably for > a number of platforms, and if so, which ones? Is it a requirement that > binary packages are available from day one when a release is done, or > is that a best effort thing depending on the availability of specific > platform maintainers? It would be useful to spell that out somewhere. > The release management doc doesn't spell out the specfic nitty gritty of what exactly the artifacts do/don't should/shouldn't contain, though it does contain some information about how to produce some of the artifacts. It's critical that we spell this out somewhere. > > > Who is responsible for producing packages? > ========================================== > > Right now it seems packages are all produced out of a single repository > "arrow-dist". That repository handles production of binary artifacts: > Python wheels, Ubuntu / CentOS / Debian packages... > > It's not obvious if specific people are responsible for each of the > package production chains. It's common in open source projects to have > dedicated persons (or teams) responsible for each platform target. > This ensures that 1) the packages are produced by motivated people > who are familiar enough with their platforms of interest 2) producing > packages does not otherwise drain the stamina of the development team. > Huge +1 on moving some of the packaging outside the scope of responsibility of arrow dev, specifically I don't think we should be responsible for anything except wheels and conda packages. One question I have here is: are the separate package type scripts/software maintained in different repositories? Also +1 on having a person reponsible for each platform. I wonder if having a person responsible for a specific kind of artifact might spread the workload more evenly since there's likely a shortage of Windows expertise. > > CI strategy > =========== > > We have two conflicting requirements: > 1) Test as much as possible as part of continuous integration (including, > possible, the production of viable binary packages) > 2) Keep CI times reasonable to avoid grinding. Some significant work > was done recently to cut down our build times on Travis-CI > and AppVeyor, often by half (ARROW-2071, ARROW-2083, ARROW-2231). > > To give a point of comparison, CPython has a two-thronged approach: > > 1) in-line CI using Travis-CI and AppVeyor, with simple build matrices > (1 build on AppVeyor, 2 required + 2 optional on Travis-CI). In-line > CI must validate for a PR to be merged. > 2) out-of-line CI using a farm of buildbots: > http://buildbot.python.org/all/#/grid?branch=master Buildbot looks *a lot* better than the last time I looked at it :) > > > Each buildbot has a maintainer, interested in keeping that specific > platform > and configuration running. Some buildbots are marked stable and strongly > recommended to be green at all times (and especially when releasing). Some > buildbots on the other hand are marked unstable and represent less > mainstream > configurations which are just "nice to fix". > > The take-aways here are: > * Mainline development isn't throttled by the production of binary > artifacts > or testing on a myriad of (possible slow or busy) CI platforms. > * Each tested configuration has a maintainer willing to identify and > diagnose > problems (either propose a solution themselves or notify the developer > responsible for a regression). > * Some things are release blockers (the "stable" platforms), some are not > and just nice to have. > IMO I would like the "stable" platforms should be conda packages for arrow/pyarrow and pip wheels. We should discuss that more. > > Two side notes: > * CPython is a much simpler project than Arrow, since it's C99 with minimal > dependencies. > * I wouldn't necessarily recommend buildbot as a CI platform. > > > Build options > ============= > > It may be useful to look into reducing the number of build options, and/or > standardize on supported settings, per platform. For example, we should > decide whether boost should be bundled or not, namespaced or not, on each > platform. People with specific development requirements can try to > override > that, but with no guarantee from us. > +1. Trying to satsify everyone's downstream needs is an impossible task. > > For example, on the llvmlite project we decided early on that we would > always > link LLVM statically. Third-party maintainers may decide to do things > differently, but they would have to maintain their own build scripts or > patches. > > > Regards > > Antoine. > > > Le 23/03/2018 à 17:58, Wes McKinney a écrit : > > hi folks, > > > > So, I want to bring light to the problems we are having delivering > > binary artifacts after Arrow releases. > > > > We have some amount of packaging automation implemented in > > https://github.com/apache/arrow-dist using Travis CI and Appveyor to > > upload packages to Bintray, a packaging hosting service. > > > > Unfortunately, we discovered a bunch of problems with these packaging > > scripts after the release vote closed on Monday, and now 4 days later, > > we still have been unable to post binaries to > > https://pypi.python.org/pypi/pyarrow > > > > This is no one's fault, but it highlights structural problems with our > > development process: > > > > * Why does producing packages after a release require error-prone manual > labor? > > > > * Why are we only finding out about packaging problem after a release > > vote closes? > > > > * Why is setting up nightly binary builds a brittle and bespoke process? > > > > I hope all agree that: > > > > * Packaging should not be a hardship or require a lot of manual labor > > > > * Packaging problems on the master branch should be made known within > > ~24 hours, so they can be remedied immediately > > > > * It should be straightforward to produce binary artifacts for all > > supported platforms and programming languages > > > > Eventually, we should include some binary artifacts in our release > > votes, but we are pretty far away from suitable automation to make > > this possible. > > > > I don't know any easy solutions, but Apache Arrow has grown widely > > used enough that I think it's worth our taking the time to plan and > > execute some solutions to these problems, which I expect to pay > > dividends in our community's productivity over time. > > > > Thanks, > > Wes > > >