hi folks, TL;DR I can't afford for me or my colleagues to continue spending time maintaining the Python binary wheel builds. They have sucked a completely unreasonable amount of time the last few months for reasons that are difficult to completely articulate in an e-mail, so I'm going to lay out the particular issues we need to discuss and decide what happens next.
A lot of people depend on "pip install pyarrow" working (with about 20 million downloads and counting) so I hope that new people will step up to maintain these packages. # WHY ARE WHEELS HARD The Python wheel binary standard was optimized for easy installation of packages with C extensions, but where those C extensions are simple to build. The best case scenario is that the C code is completely self-contained. If things are more complicated, things get messy: * Third party C libraries * Third party C++ libraries * Differing C++ ABI versions The constraint of wheels is that a package must generally be entirely self-contained, including all C/C++ symbols included via static linking or by including the shared library bundled in the wheel -- which style of bundling works best may be different for Linux, macOS, and Windows. What's happened over the last 3 years is that we have developed a pretty large set of scripts solely dedicated to producing packages that comply with the spec. # WHAT CHANGED Prior to approximately April, Uwe Korn was the main maintainer of the wheel builds -- he was maintaining them as part of his day job at Blue Yonder. Around that time, two things happened: * Uwe transitioned to a new role where he no longer carries the operational burden of maintaining wheels * Our wheel become much more complex due to Flight (requiring gRPC, OpenSSL, and other dependencies) and Gandiva (requiring LLVM and more) Note we already had been suffering due to design flaws with the wheel spec making it impossible to safely use pyarrow together with TensorFlow and PyTorch. # WHY NOW I started some recent Twitter discussions around the difficulties we had maintaining the packages, some of which are causing us to have to make a patch 0.14.1 release, taking up yet more maintainer time and keeping us from moving on to new feature work. This instigated more discussion on https://github.com/pypa/packaging-problems/issues/25#issuecomment-511460738 It seems clear to me that the self-appointed Python Package "Authority" is not acting in our best interests, and seems to have adopted the position that it's acceptable to have a language-specific binary packaging system that works well for 95% of use cases but causes unbounded punishment for a small percentage of packages. Note that pyarrow is one of the more complex packages in the Python package index -- it's about as complex to build as TensorFlow or PyTorch or other packages with deep C++ build toolchains. There's a lot of Python community politics going on here so if you aren't an insider it might be confusing to look at the discussion. Suffice to say we've been quarreling with each other over packaging issues for the last decade, complicated greatly by the emergence of a startup, Anaconda (fka Continuum Analytics), which produced a packaging tool "conda" which addressed many of the exact problems that we're having (for the record, maintaining conda packages for Apache Arrow causes us little stress at all). I feel that continuing to participate in wheel maintenance as it stands now is unethical if there is no concrete plan in place (and developer resources dedicated to see it through) to fix the wheel specification and packaging tools to alleviate the kind of suffering we are experiencing. There are various ideas about what to do, but it seems the viable route is to make pip and wheels work more like "conda", a general purpose cross-platform packaging tool that's been widely adopted among the scientific / data Python community. # WHAT NOW Apache Arrow is a community of volunteers. I hope that other volunteers will step up to take ownership of the Python wheels. In the meantime, I am going to stop spending any time on them and encourage my colleagues to do the same. I know that Krisztian is working on the 0.14.1 patch release, so after that's done I think that we'll most likely completely disengage from the wheels. If no volunteers take initiative to maintain the package builds, as soon as any problem breaks builds or release scripts, I will be advocating to disable and ignore the builds until they can be maintained in the future. I do not think it is reasonable for us to continue to be burdened by these issues. Thanks, Wes