Hello Wes, I'm ok with option 2 when we use the yet unfinished manylinux2010 image as the base. This way, we will still be able to produce wheels that in the near future are actually based an a architecture tag supported by a PEP. Also as I have some packaging nightmare, I would feel much better when we first are able to get a release out that features parquet-cpp merged into the main Arrow tree before we switch the manylinux* base image.
Uwe On Wed, Sep 5, 2018, at 1:22 AM, Ted Dunning wrote: > Just as a point of reference, I don't think that get any pushback at MapR > for not supporting RHEL 5 and that has been our policy for a few years now. > > That experience should be pretty similar for Arrow, except that I would > expect that new adoptions might be even more canted towards current > versions. > > > > > On Tue, Sep 4, 2018 at 3:24 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi folks, > > > > Surfacing a JIRA discussion ([4]) to the mailing list for discussion. > > > > The manylinux1 ABI was developed to provide a mechanism for portable > > Python packages with pre-compiled binary extensions supporting C and > > C++, including C++11, on a wide variety of Linux distributions without > > need for distribution-specific packages. This is accomplished using > > RedHat's devtoolset-2, which performs selecting static linking of > > symbols from libstdc++ that cause ABI conflicts when used on systems > > with older standard libraries. > > > > The base image for producing these binaries is specified in a Dockerfile > > [1]. > > > > The problem that we are having is that some C++ libraries, notably > > Google's Abseil C++ library, require a version of glibc that is too > > new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we > > would get a new enough glibc (version 2.12). But building against > > glibc 2.12 would leave behind the RHEL5 folks. > > > > There is the in-discussion manylinux2010 standard uses RHEL6 as a base > > standard, but it is not yet finalized or in production. > > > > Some modern C++ projects shipping to Python have already left behind > > the manylinux1 standard even though their Python binaries claim to > > implement the standard. Both PyTorch and TensorFlow are tagged as > > manylinux1 although they have a different ABI. See [2] for example and > > [3] > > > > In my view there are two paths forward, neither perfect: > > > > 1) Stick with the manylinux1 ABI and do not use thirdparty libraries > > requiring newer glibc > > 2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the > > base image for the wheel builds. This is what PyTorch is doing > > > > Since centos5 / RHEL5 are already past EOL those would be the primary > > casualties, but I'm not sure how many users would be affected. My > > guess is that they represent a small minority of our users at this > > point. RedHat is offering extended support for RHEL5 through end of > > 2020 but those are probably fairly exceptional cases and unlikely > > (IMHO) to be working on the bleeding edge of Python data engineering. > > > > Personally I would like to go with Option 2 and hope that this > > particular Python packaging gets sorted out in the next 12-24 months > > as we've already suffered problems due to TensorFlow and PyTorch's > > non-conformity with the manylinux1 ABI. > > > > Interested in the opinions of others. > > > > - Wes > > > > [1]: > > https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64 > > [2]: > > https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848 > > [3]: https://github.com/pypa/manylinux/issues/96 > > [4]: https://issues.apache.org/jira/browse/ARROW-2461 > >