Can PyTorch provide and maintain a conda-forge recipe? This would allow the large and growing conda forge ecosystem to easily install PyTorch in a community-supported way.
Are there problems with using conda or another general package manager? I agree that the machine learning packages are trying to make a language specific package manager do more than it was intended and other open source solutions already exist. Thanks, Travis On Mon, Dec 17, 2018, 12:32 AM soumith <soum...@gmail.com wrote: > I'm reposting my original reply below the current reply (below a dotted > line). It was filtered out because I wasn't subscribed to the relevant > mailing lists. > > tl;dr: manylinux2010 looks pretty promising, because CUDA supports CentOS6 > (for now). > > In the meanwhile, I dug into what pyarrow does, and it looks like it links > with `static-libstdc++` along with a linker version script [1]. > > PyTorch did exactly that until Jan this year [2], except that our linker > version script didn't cover the subtleties of statically linking stdc++ as > well as Arrow did. Because we weren't covering all of the stdc++ static > linking subtleties, we were facing huge issues that amplified wheel > incompatibility (import X; import torch crashing under various X). Hence, > we moved since then to linking with system-shipped libstdc++, doing no > static stdc++ linking. > > I'll revisit this in light of manylinux2010, and go down the path of static > linkage of stdc++ again, though I'm wary of the subtleties around handling > of weak symbols, std::string destruction across library boundaries [3] and > std::string's ABI incompatibility issues. > > I've opened a tracking issue here: > https://github.com/pytorch/pytorch/issues/15294 > > I'm looking forward to hearing from the TensorFlow devs if manylinux2010 is > sufficient for them, or what additional constraints they have. > > As a personal thought, I find multiple libraries in the same process > statically linking to stdc++ gross, but without a package manager like > Anaconda that actually is willing to deal with the C++-side dependencies, > there aren't many options on the table. > > References: > > [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/symbols.map > [2] https://github.com/pytorch/pytorch/blob/v0.3.1/tools/pytorch.version > [3] https://github.com/pytorch/pytorch/issues/5400#issuecomment-369428125 > > ............................................................................................................................................................ > Hi Philipp, > > Thanks a lot for getting a discussion started. I've sunk ~100+ hours over > the last 2 years making PyTorch wheels play well with OpenCV, TensorFlow > and other wheels, that I'm glad to see this discussion started. > > > On the PyTorch wheels, we have been shipping with the minimum glibc and > libstdc++ versions we can possibly work with, while keeping two hard > constraints: > > 1. CUDA support > 2. C++11 support > > > 1. CUDA support > > manylinux1 is not an option, considering CUDA doesn't work out of CentOS5. > I explored this option [1] to no success. > > manylinux2010 is an option at the moment wrt CUDA, but it's unclear when > NVIDIA will lift support for CentOS6 under us. > Additionally, CuDNN 7.0 (if I remember) was compiled against Ubuntu 12.04 > (meaning the glibc version is newer than CentOS6), and binaries linked > against CuDNN refused to run on CentOS6. I requested that this constraint > be lifted, and the next dot release fixed it. > > The reason PyTorch binaries are not manylinux2010 compatible at the moment > is because of the next constraint: C++11. > > 2. C++11 > > We picked C++11 as the minimum supported dialect for PyTorch, primarily to > serve the default compilers of older machines, i.e. Ubuntu 14.04 and > CentOS7. The newer options were C++14 / C++17, but we decided to polyfill > what we needed to support older distros better. > > A fully fleshed out C++11 implementation landed in gcc in various stages, > with gradual ABI changes [2]. Unfortunately, the libstdc++ that ships with > centos6 (and hence manylinx2010) isn't sufficient to cover all of C++11. > For example, the binaries we built with devtoolset3 (gcc 4.9.2) on CentOS6 > didn't run with the default libstdc++ on CentOS6 either due to ABI changes > or minimum GLIBCXX version for some of the symbols being unavailable. > > We tried our best to support our binaries running on CentOS6 and above with > various ranges of static linking hacks until 0.3.1 (January 2018), but at > some point hacks over hacks was only getting more fragile. Hence we moved > to a CentOS7-based image in April 2018 [3], and relied only on dynamic > linking to the system-shipped libstdc++. > > As Wes mentions [4], an option is to host a modern C++ standard library via > PyPI would put manylinux2010 on the table. There are however subtle > consequences with this -- if this package gets installed into a conda > environment, it'll clobber anaconda-shipped libstdc++, possibly corrupting > environments for thousands of anaconda users (this is actually similar to > the issues with `mkl` shipped via PyPI and Conda clobbering each other). > > > References: > > [1] https://github.com/NVIDIA/nvidia-docker/issues/348 > [2] https://gcc.gnu.org/wiki/Cxx11AbiCompatibility > [3] > > https://github.com/pytorch/builder/commit/44d9bfa607a7616c66fe6492fadd8f05f3578b93 > [4] https://github.com/apache/arrow/pull/3177#issuecomment-447515982 > > .............................................................................................................................................................................................. > > On Sun, Dec 16, 2018 at 2:57 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > Reposting since I wasn't subscribed to develop...@tensorflow.org. I > > also didn't see Soumith's response since it didn't come through to > > dev@arrow.apache.org > > > > In response to the non-conforming ABI in the TF and PyTorch wheels, we > > have attempted to hack around the issue with some elaborate > > workarounds [1] [2] that have ultimately proved to not work > > universally. The bottom line is that this is burdening other projects > > in the Python ecosystem and causing confusing application crashes. > > > > First, to state what should hopefully obvious to many of you, Python > > wheels are not a robust way to deploy complex C++ projects, even > > setting aside the compiler toolchain issue. If a project has > > non-trivial third party dependencies, you either have to statically > > link them or bundle shared libraries with the wheel (we do a bit of > > both in Apache Arrow). Neither solution is foolproof in all cases. > > There are other downsides to wheels when it comes to numerical > > computing -- it is difficult to utilize things like the Intel MKL > > which may be used by multiple projects. If two projects have the same > > third party C++ dependency (e.g. let's use gRPC or libprotobuf as a > > straw man example), it's hard to guarantee that versions or ABI will > > not conflict with each other. > > > > In packaging with conda, we pin all dependencies when building > > projects that depend on them, then package and deploy the dependencies > > as separate shared libraries instead of bundling. To resolve the need > > for newer compilers or newer C++ standard library, libstdc++.so and > > other system shared libraries are packaged and installed as > > dependencies. In manylinux1, the RedHat devtoolset compiler toolchain > > is used as it performs selective static linking of symbols to enable > > C++11 libraries to be deployed on older Linuxes like RHEL5/6. A conda > > environment functions as sort of portable miniature Linux > > distribution. > > > > Given the current state of things, as using the TensorFlow and PyTorch > > wheels in the same process as other conforming manylinux1 wheels is > > unsafe, it's hard to see how one can continue to recommend pip as a > > preferred installation path until the ABI problems are resolved. For > > example, "pip" is what is recommended for installing TensorFlow on > > Linux [3]. It's unclear that non-compliant wheels should be allowed in > > the package manager at all (I'm aware that this was deemed to not be > > the responsibility of PyPI to verify policy compliance [4]). > > > > A couple possible paths forward (there may be others): > > > > * Collaborate with the Python packaging authority to evolve the > > manylinux ABI to be able to produce compliant wheels that support the > > build and deployment requirements of these projects > > * Create a new ABI tag for CUDA/C++11-enabled Python wheels so that > > projects can ship packages that can be guaranteed to work properly > > with TF/PyTorch. This might require vendoring libstdc++ in some kind > > of "toolchain" wheel that projects using this new ABI can depend on > > > > Note that these toolchain and deployment issues are absent when > > building and deploying with conda packages, since build- and run-time > > dependencies can be pinned and shared across all the projects that > > depend on them, ensuring ABI cross-compatibility. It's great to have > > the convenience of "pip install $PROJECT", but I believe that these > > projects have outgrown the intended use for pip and wheel > > distributions. > > > > Until the ABI incompatibilities are resolved, I would encourage more > > prominent user documentation about the non-portability and potential > > for crashes with these Linux wheels. > > > > Thanks, > > Wes > > > > [1]: > > > https://github.com/apache/arrow/commit/537e7f7fd503dd920c0b9f0cef8a2de86bc69e3b > > [2]: > > > https://github.com/apache/arrow/commit/e7aaf7bf3d3e326b5fe58d20f8fc45b5cec01cac > > [3]: https://www.tensorflow.org/install/ > > [4]: https://www.python.org/dev/peps/pep-0513/#id50 > > On Sat, Dec 15, 2018 at 11:25 PM Robert Nishihara > > <robertnishih...@gmail.com> wrote: > > > > > > On Sat, Dec 15, 2018 at 8:43 PM Philipp Moritz <pcmor...@gmail.com> > > wrote: > > > > > > > Dear all, > > > > > > > > As some of you know, there is a standard in Python called manylinux ( > > > > https://www.python.org/dev/peps/pep-0513/) to package binary > > executables > > > > and libraries into a “wheel” in a way that allows the code to be run > > on a > > > > wide variety of Linux distributions. This is very convenient for > Python > > > > users, since such libraries can be easily installed via pip. > > > > > > > > This standard is also important for a second reason: If many > different > > > > wheels are used together in a single Python process, adhering to > > manylinux > > > > ensures that these libraries work together well and don’t trip on > each > > > > other’s toes (this could easily happen if different versions of > > libstdc++ > > > > are used for example). Therefore *even if support for only a single > > > > distribution like Ubuntu is desired*, it is important to be manylinux > > > > compatible to make sure everybody’s wheels work together well. > > > > > > > > TensorFlow and PyTorch unfortunately don’t produce manylinux > compatible > > > > wheels. The challenge is due, at least in part, to the need to use > > > > nvidia-docker to build GPU binaries [10]. This causes various levels > of > > > > pain for the rest of the Python community, see for example [1] [2] > [3] > > [4] > > > > [5] [6] [7] [8]. > > > > > > > > The purpose of the e-mail is to get a discussion started on how we > can > > > > make TensorFlow and PyTorch manylinux compliant. There is a new > > standard in > > > > the works [9] so hopefully we can discuss what would be necessary to > > make > > > > sure TensorFlow and PyTorch can adhere to this standard in the > future. > > > > > > > > It would make everybody’s lives just a little bit better! Any ideas > are > > > > appreciated. > > > > > > > > @soumith: Could you cc the relevant list? I couldn't find a pytorch > dev > > > > mailing list. > > > > > > > > Best, > > > > Philipp. > > > > > > > > [1] https://github.com/tensorflow/tensorflow/issues/5033 > > > > [2] https://github.com/tensorflow/tensorflow/issues/8802 > > > > [3] https://github.com/primitiv/primitiv-python/issues/28 > > > > [4] https://github.com/zarr-developers/numcodecs/issues/70 > > > > [5] https://github.com/apache/arrow/pull/3177 > > > > [6] https://github.com/tensorflow/tensorflow/issues/13615 > > > > [7] https://github.com/pytorch/pytorch/issues/8358 > > > > [8] https://github.com/ray-project/ray/issues/2159 > > > > [9] https://www.python.org/dev/peps/pep-0571/ > > > > [10] > > > > > > > https://github.com/tensorflow/tensorflow/issues/8802#issuecomment-291935940 > > > > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "ray-dev" group. > > > > To unsubscribe from this group and stop receiving emails from it, > send > > an > > > > email to ray-dev+unsubscr...@googlegroups.com. > > > > To post to this group, send email to ray-...@googlegroups.com. > > > > To view this discussion on the web visit > > > > > > > https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com > > > > < > > > https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com?utm_medium=email&utm_source=footer > > > > > > > . > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > >