This seems reasonable as long as it is actually feasible (the dependencies
are cleanly separable)..

A while ago I had a proof of concept bazel build working that was able to
automatically build the changes together.

On Monday, August 16, 2021, David Li <lidav...@apache.org> wrote:

> I support this. In the past I had to effectively do this manually to build
> Arrow/PyArrow in a monorepo (to build for multiple Python versions
> simultaneously without having conflicting copies of Arrow for each Python
> version). From what I remember, there's some usage of Arrow-internal
> headers that need to be replaced, but fortunately they were all very simple
> to replace.
>
> Though in my personal experience, it wasn't often that I needed to touch
> src/arrow/python.
>
> -David
>
> On Mon, Aug 16, 2021, at 11:08, Alessandro Molina wrote:
> > PyArrow is currently full Cython codebase, but in reality it relies on
> some
> > classes and functions that are implemented in C++ within the src/python
> > directory ( https://github.com/apache/arrow/tree/master/cpp/src/
> arrow/python
> > ). Especially for numpy/pandas conversion code that has to interface with
> > Numpy arrays data at low level.
> >
> > When working in the area of PyArrow it's not uncommon that you end up
> > jumping back and forth between the Arrow C++ codebase for Python and
> > PyArrow and you can also end up with, sometimes hard to catch,
> integration
> > issues if you forgot to recompile libarrow even if you are working on a
> > Python only change.
> >
> > I'm wondering if it wouldn't make life easier for contributors if the
> > src/arrow/python directory was moved into pyarrow and we made PyArrow
> able
> > to build it.
> >
> > That would probably reduce risk of integration issues as rebuilding
> pyarrow
> > alone would probably be enough for most python specific changes (as it
> > would also rebuild the Python specific C++).
> >
> > I think that moving src/arrow/python into pyarrow would also make the
> > codebase more cohesive which would lower the barrier for new contributors
> > looking for how to fix a pyarrow specific issue.
> >
> > Unless there is any major side effect (outside of having to build a bit
> > more complex build scripts for pyarrow, but it's already CMake based, so
> > building some C++ shouldn't be a big deal) that I'm missing, it seems
> that
> > the benefits of having all Python related code into a single place would
> > surpass the side effects.
> >
> > Also I'm not sure how widespread it is the requirement of Python from
> C++,
> > but it seems to me that if we moved all Python specific code into pyarrow
> > we could make libarrow decoupled from Python. Which might make it easier
> to
> > deal with Virtualenvs or debug versions of python as you wouldn't have to
> > deal with Python3_EXECUTABLE etc when building libarrow.
> >
> > Any thoughts?
> >
>

Reply via email to