On Tue, 17 Aug 2021 at 16:20, Alessandro Molina
<alessan...@ursacomputing.com> wrote:
> ...
> There are by the way some interesting points, like the fact that the mask
> for a pyarrow array can only be a numpy array, how could I create a masked
> array without numpy? I guess that accepting arrow arrays as mask is
> actually something we should allow anyway.

A somewhat related JIRA about how to create a new array with a given
validity bitmap: https://issues.apache.org/jira/browse/ARROW-7071

>
> On Mon, Aug 16, 2021 at 6:53 PM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > I agree that "what happens when Numpy is not available at runtime" is a
> > rather annoying problem.  I'm not sure what happens when you call one
> > of the Numpy C API functions and Numpy is not found (crash? error
> > return?).  It can probably be detected, but needs to be done
> > consistently at the start of each PyArrow core function, which requires
> > some care.
> >
> > At the end of the day, it looks like this would be a significant amount
> > of work for a relatively minor benefit (did people complain about
> > this?), so I'm not sure it's worth spending some time on it.

Personally, I follow this sentiment of Antoine: yes, in an ideal world
we wouldn't have a hard dependency on numpy. But as long as there is
not a clear demand/use case for it, I am not sure it's worth the
effort.

Joris

> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Mon, 16 Aug 2021 18:09:54 +0200
> > Wes McKinney <wesmck...@gmail.com> wrote:
> > > I've thought about this in the past, and I would like to make NumPy an
> > > optional dependency, but one of the things that kept me from trying
> > > was the extent to which NumPy arrays are supported as inputs (or
> > > elements of inputs) to pyarrow.array. The implementation in
> > > python_to_arrow.cc is significantly intertwined with NumPy's C API. It
> > > might require maintaining two altogether different internal
> > > implementations of pyarrow.array, a complicated one which deals with
> > > all the NumPy oddities (including NumPy array scalars) and a much
> > > simpler one that does not. pyarrow may have to detect at runtime
> > > whether numpy is in sys.modules to decide whether to import and invoke
> > > the more complicated function.
> > >
> > > On Mon, Aug 16, 2021 at 5:59 PM Alessandro Molina
> > > <alessan...@ursacomputing.com> wrote:
> > > >
> > > > As Arrow/PyArrow grows more compute functions and features we might
> > move
> > > > toward a world where the number of users relying on PyArrow without
> > going
> > > > through Pandas or NumPy might grow.
> > > >
> > > > NumPy is a compile time dependency for PyArrow as it's required to
> > compile
> > > > the C++ code needed to implement the pandas/numpy integration, but
> > there
> > > > has been some discussion regard the fact that we could make NumPy
> > optional
> > > > at runtime (remove it from required dependencies in the Python
> > > > distribution). You would have to install numpy only if you need to
> > invoke
> > > > to_numpy or to_pandas methods or similar integration features. For all
> > the
> > > > other use cases, that rely on Arrow alone, you would be able to pip
> > install
> > > > pyarrow without involving any other dependency and be ready to go.
> > > >
> > > > Technically it seems a bit complicated, Python/Cython can always work
> > > > around missing libraries, but we would have to find ways to deal with
> > lazy
> > > > involvement of numpy from C++. I don't know if this is something that
> > was
> > > > already discussed in the past and thus someone already has solutions
> > for
> > > > this part of the problem, but before investing time and effort in
> > research
> > > > I think it made sense to make sure it's a goal that the development
> > team
> > > > agrees with.
> > >
> >
> >
> >
> >

Reply via email to