I definitely think this is desirable.

There's probably going to be a bit of work getting it to pass on all CI (including the various nightly builds).

Regards

Antoine.


Le 16/08/2021 à 17:08, Alessandro Molina a écrit :
PyArrow is currently full Cython codebase, but in reality it relies on some
classes and functions that are implemented in C++ within the src/python
directory ( https://github.com/apache/arrow/tree/master/cpp/src/arrow/python
). Especially for numpy/pandas conversion code that has to interface with
Numpy arrays data at low level.

When working in the area of PyArrow it's not uncommon that you end up
jumping back and forth between the Arrow C++ codebase for Python and
PyArrow and you can also end up with, sometimes hard to catch, integration
issues if you forgot to recompile libarrow even if you are working on a
Python only change.

I'm wondering if it wouldn't make life easier for contributors if the
src/arrow/python directory was moved into pyarrow and we made PyArrow able
to build it.

That would probably reduce risk of integration issues as rebuilding pyarrow
alone would probably be enough for most python specific changes (as it
would also rebuild the Python specific C++).

I think that moving src/arrow/python into pyarrow would also make the
codebase more cohesive which would lower the barrier for new contributors
looking for how to fix a pyarrow specific issue.

Unless there is any major side effect (outside of having to build a bit
more complex build scripts for pyarrow, but it's already CMake based, so
building some C++ shouldn't be a big deal) that I'm missing, it seems that
the benefits of having all Python related code into a single place would
surpass the side effects.

Also I'm not sure how widespread it is the requirement of Python from C++,
but it seems to me that if we moved all Python specific code into pyarrow
we could make libarrow decoupled from Python. Which might make it easier to
deal with Virtualenvs or debug versions of python as you wouldn't have to
deal with Python3_EXECUTABLE etc when building libarrow.

Any thoughts?

Reply via email to