I agree with this proposal, the Arrow C++ library does not need to depend on Python or PyArrow code. AFAIU this will eliminate the use of -DARROW_PYTHON build flag for Arrow C++ given that Python-related code will be compiled with PyArrow builds. Besides the use of "ARROW_PYTHON" env variable in CMakeLists.txt, the "dbi/hiveserver2" build makes use of "ARROW_PYTHON_SHARED_LINK_LIBS" [1].
[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/dbi/hiveserver2/CMakeLists.txt#L90 ~Eduardo On Mon, Aug 16, 2021 at 11:24 AM Antoine Pitrou <anto...@python.org> wrote: > > I definitely think this is desirable. > > There's probably going to be a bit of work getting it to pass on all CI > (including the various nightly builds). > > Regards > > Antoine. > > > Le 16/08/2021 à 17:08, Alessandro Molina a écrit : > > PyArrow is currently full Cython codebase, but in reality it relies on > some > > classes and functions that are implemented in C++ within the src/python > > directory ( > https://github.com/apache/arrow/tree/master/cpp/src/arrow/python > > ). Especially for numpy/pandas conversion code that has to interface with > > Numpy arrays data at low level. > > > > When working in the area of PyArrow it's not uncommon that you end up > > jumping back and forth between the Arrow C++ codebase for Python and > > PyArrow and you can also end up with, sometimes hard to catch, > integration > > issues if you forgot to recompile libarrow even if you are working on a > > Python only change. > > > > I'm wondering if it wouldn't make life easier for contributors if the > > src/arrow/python directory was moved into pyarrow and we made PyArrow > able > > to build it. > > > > That would probably reduce risk of integration issues as rebuilding > pyarrow > > alone would probably be enough for most python specific changes (as it > > would also rebuild the Python specific C++). > > > > I think that moving src/arrow/python into pyarrow would also make the > > codebase more cohesive which would lower the barrier for new contributors > > looking for how to fix a pyarrow specific issue. > > > > Unless there is any major side effect (outside of having to build a bit > > more complex build scripts for pyarrow, but it's already CMake based, so > > building some C++ shouldn't be a big deal) that I'm missing, it seems > that > > the benefits of having all Python related code into a single place would > > surpass the side effects. > > > > Also I'm not sure how widespread it is the requirement of Python from > C++, > > but it seems to me that if we moved all Python specific code into pyarrow > > we could make libarrow decoupled from Python. Which might make it easier > to > > deal with Virtualenvs or debug versions of python as you wouldn't have to > > deal with Python3_EXECUTABLE etc when building libarrow. > > > > Any thoughts? > > >