Given we didn't get much opinions on this one, I will propose we move forward with merging the open PR that moves ipc cython implementation and discover if we receive any open issue because projects out there were relying on it. It seems that ipc is a low risk module from that point of view and will at least reduce the surface of `pyarrow.lib` making easier to reason about what should be public or internal in the future.
If we get users complaining that they were using ipc from Cython we can think how to expose it properly instead of exposing it by chance as a side effect of using includes in Cython On Fri, Aug 20, 2021 at 12:24 PM Alessandro Molina < alessan...@ursacomputing.com> wrote: > While working on https://github.com/apache/arrow/pull/10162 it was raised > the concern that it's hard to change Cython code because it might break > third party libraries and projects relying on pyarrow through Cython. > > Mostly the problem comes from the fact that the documentation suggests > pyarrow.lib.* ( > https://arrow.apache.org/docs/python/extending.html#example ) as what > should be used to import features from pyarrow in Cython. > Given most of pyarrow is implemented including pxi files into the lib.pyx > module ( > https://github.com/apache/arrow/blob/master/python/pyarrow/lib.pyx#L118-L163 > ) it means that we are exposing the majority of the internals as our public > api. > > The consequence is that we in practice are preventing ourselves from > touching anything that exists in those included files as they might have > been used by another project and thus they can't be moved or change their > signature. > > We could argue that only what was documented explicitly should be > considered "public" and everything else can be changed, but our > documentation seems to be unclear on this point. It lists some functions > that should be considered our explicit api ( > https://arrow.apache.org/docs/python/extending.html#cython-api ) but then > uses CArray in the example ( > https://arrow.apache.org/docs/python/extending.html#example ) which > wasn't listed as public. > > I think it would be helpful to come to an agreement about what we should > consider publicly exposed from Cython so that we can properly update > documentation and unblock possible refactoring. > > Personally, even at risk of breaking third parties code, I think it would > be wise to aim for the minimum exposed surface. I'd consider Cython mostly > an implementation detail and promote usage of libarrow from C/C++ directly > if you need to work on high performance Python extensions. >