I agree with this proposal, the Arrow C++ library does not need to depend
on Python or PyArrow code.
AFAIU this will eliminate the use of -DARROW_PYTHON build flag for Arrow
C++ given that Python-related code will be compiled with PyArrow builds.
Besides the use of "ARROW_PYTHON" env variable in CMakeLists.txt, the
"dbi/hiveserver2" build makes use of "ARROW_PYTHON_SHARED_LINK_LIBS" [1].

[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/dbi/hiveserver2/CMakeLists.txt#L90

~Eduardo

On Mon, Aug 16, 2021 at 11:24 AM Antoine Pitrou <anto...@python.org> wrote:

>
> I definitely think this is desirable.
>
> There's probably going to be a bit of work getting it to pass on all CI
> (including the various nightly builds).
>
> Regards
>
> Antoine.
>
>
> Le 16/08/2021 à 17:08, Alessandro Molina a écrit :
> > PyArrow is currently full Cython codebase, but in reality it relies on
> some
> > classes and functions that are implemented in C++ within the src/python
> > directory (
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/python
> > ). Especially for numpy/pandas conversion code that has to interface with
> > Numpy arrays data at low level.
> >
> > When working in the area of PyArrow it's not uncommon that you end up
> > jumping back and forth between the Arrow C++ codebase for Python and
> > PyArrow and you can also end up with, sometimes hard to catch,
> integration
> > issues if you forgot to recompile libarrow even if you are working on a
> > Python only change.
> >
> > I'm wondering if it wouldn't make life easier for contributors if the
> > src/arrow/python directory was moved into pyarrow and we made PyArrow
> able
> > to build it.
> >
> > That would probably reduce risk of integration issues as rebuilding
> pyarrow
> > alone would probably be enough for most python specific changes (as it
> > would also rebuild the Python specific C++).
> >
> > I think that moving src/arrow/python into pyarrow would also make the
> > codebase more cohesive which would lower the barrier for new contributors
> > looking for how to fix a pyarrow specific issue.
> >
> > Unless there is any major side effect (outside of having to build a bit
> > more complex build scripts for pyarrow, but it's already CMake based, so
> > building some C++ shouldn't be a big deal) that I'm missing, it seems
> that
> > the benefits of having all Python related code into a single place would
> > surpass the side effects.
> >
> > Also I'm not sure how widespread it is the requirement of Python from
> C++,
> > but it seems to me that if we moved all Python specific code into pyarrow
> > we could make libarrow decoupled from Python. Which might make it easier
> to
> > deal with Virtualenvs or debug versions of python as you wouldn't have to
> > deal with Python3_EXECUTABLE etc when building libarrow.
> >
> > Any thoughts?
> >
>

Reply via email to