We've discussed this in the past, I think. In addition to having many
optional components enabled, the pyarrow wheel also includes the unit
tests directory which is of growing size. I think if we made a
pyarrow-slim wheel with support only for core Arrow (IPC, etc.) and
Parquet file reading, it might be possible to trim by significant
percentage.

Rusty -- if you would like to push this forward I would suggest
creating an alternative wheel build script to the one that we use and
modify flags / add other customizations (e.g. trimming unit tests)
that produce a wheel that we could build and possibly upload as
"pyarrow-slim" on PyPI

On Mon, Oct 3, 2022 at 8:55 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Hi Rusty,
>
> Le 02/10/2022 à 22:51, Rusty Conover a écrit :
> > Hi Arrow Team,
> >
> > I'm using Apache Arrow with AWS Lambda Functions.
> >
> > The primary motivation is AWS Athena's user-defined functions[1].  Those
> > functions process and return Arrow IPC segments.
> >
> > * The published Python wheels for Apache Arrow include almost every feature
> > of Arrow. (Gandiva, Plasma, Flight)
>
> Gandiva isn't compiled in the Python wheels. Plasma is reasonably small
> (but is also being deprecated soon). Flight is more sizable. However,
> most of the size seems to be in Arrow itself and Parquet. A large part
> of the size is probably attributable to the Arrow compute engine and
> functions, and also perhaps to filesystem implementations such as S3 and
> GCS (due to the large third-party dependencies that they bundle).
>
> > Would it be possible to create a new Python package (i.e., "pyarrow-slim")
> > that would disable some of the functionality but result in smaller python
> > wheels?
>
> Perhaps. The first step would be to allow disabling more components in
> PyArrow, though. Otherwise I'm afraid the size reduction wouldn't be
> terrific.
>
> Regards
>
> Antoine.

Reply via email to