hi folks,

TL;DR I can't afford for me or my colleagues to continue spending time
maintaining the Python binary wheel builds. They have sucked a
completely unreasonable amount of time the last few months for reasons
that are difficult to completely articulate in an e-mail, so I'm going
to lay out the particular issues we need to discuss and decide what
happens next.

A lot of people depend on "pip install pyarrow" working (with about 20
million downloads and counting) so I hope that new people will step up
to maintain these packages.

# WHY ARE WHEELS HARD

The Python wheel binary standard was optimized for easy installation
of packages with C extensions, but where those C extensions are simple
to build. The best case scenario is that the C code is completely
self-contained.

If things are more complicated, things get messy:

* Third party C libraries
* Third party C++ libraries
* Differing C++ ABI versions

The constraint of wheels is that a package must generally be entirely
self-contained, including all C/C++ symbols included via static
linking or by including the shared library bundled in the wheel --
which style of bundling works best may be different for Linux, macOS,
and Windows.

What's happened over the last 3 years is that we have developed a
pretty large set of scripts solely dedicated to producing packages
that comply with the spec.

# WHAT CHANGED

Prior to approximately April, Uwe Korn was the main maintainer of the
wheel builds -- he was maintaining them as part of his day job at Blue
Yonder. Around that time, two things happened:

* Uwe transitioned to a new role where he no longer carries the
operational burden of maintaining wheels
* Our wheel become much more complex due to Flight (requiring gRPC,
OpenSSL, and other dependencies) and Gandiva (requiring LLVM and more)

Note we already had been suffering due to design flaws with the wheel
spec making it impossible to safely use pyarrow together with
TensorFlow and PyTorch.

# WHY NOW

I started some recent Twitter discussions around the difficulties we
had maintaining the packages, some of which are causing us to have to
make a patch 0.14.1 release, taking up yet more maintainer time and
keeping us from moving on to new feature work. This instigated more
discussion on

https://github.com/pypa/packaging-problems/issues/25#issuecomment-511460738

It seems clear to me that the self-appointed Python Package
"Authority" is not acting in our best interests, and seems to have
adopted the position that it's acceptable to have a language-specific
binary packaging system that works well for 95% of use cases but
causes unbounded punishment for a small percentage of packages. Note
that pyarrow is one of the more complex packages in the Python package
index -- it's about as complex to build as TensorFlow or PyTorch or
other packages with deep C++ build toolchains.

There's a lot of Python community politics going on here so if you
aren't an insider it might be confusing to look at the discussion.
Suffice to say we've been quarreling with each other over packaging
issues for the last decade, complicated greatly by the emergence of a
startup, Anaconda (fka Continuum Analytics), which produced a
packaging tool "conda" which addressed many of the exact problems that
we're having (for the record, maintaining conda packages for Apache
Arrow causes us little stress at all).

I feel that continuing to participate in wheel maintenance as it
stands now is unethical if there is no concrete plan in place (and
developer resources dedicated to see it through) to fix the wheel
specification and packaging tools to alleviate the kind of suffering
we are experiencing. There are various ideas about what to do, but it
seems the viable route is to make pip and wheels work more like
"conda", a general purpose cross-platform packaging tool that's been
widely adopted among the scientific / data Python community.

# WHAT NOW

Apache Arrow is a community of volunteers. I hope that other
volunteers will step up to take ownership of the Python wheels. In the
meantime, I am going to stop spending any time on them and encourage
my colleagues to do the same. I know that Krisztian is working on the
0.14.1 patch release, so after that's done I think that we'll most
likely completely disengage from the wheels.

If no volunteers take initiative to maintain the package builds, as
soon as any problem breaks builds or release scripts, I will be
advocating to disable and ignore the builds until they can be
maintained in the future. I do not think it is reasonable for us to
continue to be burdened by these issues.

Thanks,
Wes

Reply via email to