In response to the non-conforming ABI in the TF and PyTorch wheels, we have attempted to hack around the issue with some elaborate workarounds [1] [2] that have ultimately proved to not work universally. The bottom line is that this is burdening other projects in the Python ecosystem and causing confusing application crashes.
First, to state what should hopefully obvious to many of you, Python wheels are not a robust way to deploy complex C++ projects, even setting aside the compiler toolchain issue. If a project has non-trivial third party dependencies, you either have to statically link them or bundle shared libraries with the wheel (we do a bit of both in Apache Arrow). Neither solution is foolproof in all cases. There are other downsides to wheels when it comes to numerical computing -- it is difficult to utilize things like the Intel MKL which may be used by multiple projects. If two projects have the same third party C++ dependency (e.g. let's use gRPC or libprotobuf as a straw man example), it's hard to guarantee that versions or ABI will not conflict with each other. In packaging with conda, we pin all dependencies when building projects that depend on them, then package and deploy the dependencies as separate shared libraries instead of bundling. To resolve the need for newer compilers or newer C++ standard library, libstdc++.so and other system shared libraries are packaged and installed as dependencies. In manylinux1, the RedHat devtoolset compiler toolchain is used as it performs selective static linking of symbols to enable C++11 libraries to be deployed on older Linuxes like RHEL5/6. A conda environment functions as sort of portable miniature Linux distribution. Given the current state of things, as using the TensorFlow and PyTorch wheels in the same process as other conforming manylinux1 wheels is unsafe, it's hard to see how one can continue to recommend pip as a preferred installation path until the ABI problems are resolved. For example, "pip" is what is recommended for installing TensorFlow on Linux [3]. It's unclear that non-compliant wheels should be allowed in the package manager at all (I'm aware that this was deemed to not be the responsibility of PyPI to verify policy compliance [4]). A couple possible paths forward (there may be others): * Collaborate with the Python packaging authority to evolve the manylinux ABI to be able to produce compliant wheels that support the build and deployment requirements of these projects * Create a new ABI tag for CUDA/C++11-enabled Python wheels so that projects can ship packages that can be guaranteed to work properly with TF/PyTorch. This might require vendoring libstdc++ in some kind of "toolchain" wheel that projects using this new ABI can depend on Note that these toolchain and deployment issues are absent when building and deploying with conda packages, since build- and run-time dependencies can be pinned and shared across all the projects that depend on them, ensuring ABI cross-compatibility. It's great to have the convenience of "pip install $PROJECT", but I believe that these projects have outgrown the intended use for pip and wheel distributions. Until the ABI incompatibilities are resolved, I would encourage more prominent user documentation about the non-portability and potential for crashes with these Linux wheels. Thanks, Wes [1]: https://github.com/apache/arrow/commit/537e7f7fd503dd920c0b9f0cef8a2de86bc69e3b [2]: https://github.com/apache/arrow/commit/e7aaf7bf3d3e326b5fe58d20f8fc45b5cec01cac [3]: https://www.tensorflow.org/install/ [4]: https://www.python.org/dev/peps/pep-0513/#id50 On Sat, Dec 15, 2018 at 11:25 PM Robert Nishihara <robertnishih...@gmail.com> wrote: > > On Sat, Dec 15, 2018 at 8:43 PM Philipp Moritz <pcmor...@gmail.com> wrote: > > > Dear all, > > > > As some of you know, there is a standard in Python called manylinux ( > > https://www.python.org/dev/peps/pep-0513/) to package binary executables > > and libraries into a “wheel” in a way that allows the code to be run on a > > wide variety of Linux distributions. This is very convenient for Python > > users, since such libraries can be easily installed via pip. > > > > This standard is also important for a second reason: If many different > > wheels are used together in a single Python process, adhering to manylinux > > ensures that these libraries work together well and don’t trip on each > > other’s toes (this could easily happen if different versions of libstdc++ > > are used for example). Therefore *even if support for only a single > > distribution like Ubuntu is desired*, it is important to be manylinux > > compatible to make sure everybody’s wheels work together well. > > > > TensorFlow and PyTorch unfortunately don’t produce manylinux compatible > > wheels. The challenge is due, at least in part, to the need to use > > nvidia-docker to build GPU binaries [10]. This causes various levels of > > pain for the rest of the Python community, see for example [1] [2] [3] [4] > > [5] [6] [7] [8]. > > > > The purpose of the e-mail is to get a discussion started on how we can > > make TensorFlow and PyTorch manylinux compliant. There is a new standard in > > the works [9] so hopefully we can discuss what would be necessary to make > > sure TensorFlow and PyTorch can adhere to this standard in the future. > > > > It would make everybody’s lives just a little bit better! Any ideas are > > appreciated. > > > > @soumith: Could you cc the relevant list? I couldn't find a pytorch dev > > mailing list. > > > > Best, > > Philipp. > > > > [1] https://github.com/tensorflow/tensorflow/issues/5033 > > [2] https://github.com/tensorflow/tensorflow/issues/8802 > > [3] https://github.com/primitiv/primitiv-python/issues/28 > > [4] https://github.com/zarr-developers/numcodecs/issues/70 > > [5] https://github.com/apache/arrow/pull/3177 > > [6] https://github.com/tensorflow/tensorflow/issues/13615 > > [7] https://github.com/pytorch/pytorch/issues/8358 > > [8] https://github.com/ray-project/ray/issues/2159 > > [9] https://www.python.org/dev/peps/pep-0571/ > > [10] > > https://github.com/tensorflow/tensorflow/issues/8802#issuecomment-291935940 > > > > -- > > You received this message because you are subscribed to the Google Groups > > "ray-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to ray-dev+unsubscr...@googlegroups.com. > > To post to this group, send email to ray-...@googlegroups.com. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com > > <https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com?utm_medium=email&utm_source=footer> > > . > > For more options, visit https://groups.google.com/d/optout. > >