Hi Antoine,

This is exciting work. I am generally in favor of putting inside PyArrow
for easy of use and ABI reasons above. Can you explain a bit more what are
the downsides of putting in PyArrow vs a separate package?

Li

On Thu, Mar 26, 2026 at 11:08 AM Antoine Pitrou <[email protected]> wrote:

>
> Hello,
>
> Numba (https://numba.pydata.org/) is a Just-in-Time compiler for Python
> that allows to speed up scientific calculations written in Python. Out
> of the box, Numba supports Numpy arrays (which was the primary target
> for its design).
>
> We (at QuantStack) have been investigating the feasibility of supporting
> a subset of PyArrow in Numba, so that the fast computation abilities of
> Numba can extend to data in the Arrow format.
>
> We have come to the conclusion that supporting a small subset of PyArrow
> is definitely doable, at a competitive performance level (between "as
> fast as C++" and "4x slower" on a couple preliminary micro-benchmarks).
>
> (by "small subset" we mostly mean: primitive data types, reading and
> building arrays)
>
> The Numba integration layer would ideally have to be maintained and
> distributed within PyArrow, because of the need to access a number of
> Arrow C++ APIs, which don't have a stable ABI (it *might* be possible to
> work around this by exporting a dedicated C-like ABI from PyArrow, though).
>
> What we would like to know is how the community feels about putting this
> code inside PyArrow, rather than a separate package, for the reason
> given above.
>
> This would *not* add a dependency on Numba, since this can be exposed as
> a dynamically-loaded extension point:
> https://numba.readthedocs.io/en/stable/extending/entrypoints.html
>
> (note: this preliminary investigation was supported by one of our fine
> customers)
>
> Regards
>
> Antoine.
>
>

Reply via email to