Hi all, I'd like to discuss a packaging change for arrow.
AFAIU, there are two broad categories of frameworks that use Arrow. 1. Projects that only use Arrow core (ex: cudf, ray) - where they follow the Arrow format, but internally they are using their own Arrow impl. So, they mostly need to read/ write from the Arrow core public APIs to convert to/from their internal impl. 2. Projects that use arrow intimately (ex: cylon) - where they use Arrow sub-components intimately (ex: compute, flight, etc). These may also depend/ support Type1 projects as well (ex: GCylon for with cudf) Now, as a member of the latter category, a major challenge we face is managing dependencies. We currently depend on Arrow v5 and cudf 21.10 but can not upgrade to v6 because Cudf is yet to upgrade the Arrow dependencies. But when we look at the version upgrade PR [1], there's hardly any API changes. So, I would like to see if it is possible to separate out Arrow format and core from the other subcomponents such as flight, compute, datasets, etc so that outside projects can have independent dependencies to these components? AFAIU Arrow format and core API is more-or-less stable, while sub components like flight, compute, datasets, etc have major API changes. So, projects like cudf does not have to upgrade their arrow-core dependency as often, while the others can enjoy new features of the subcomponents. Ultimately we'd see dependencies as follows. libarrow_core.so.100 (independant) libarrow.so.600 (other subcomponents) <- libarrow_core.so.100 libcudf.so.2110 <- libarrow_core.so.100 libcylon.so.050 <- libarrow_core.so.100, libarrow.so.600, libcudf.so.2110 I understand this may require a lot of changes in the release process. But I just wanted to float the idea to the community and see if this is doable. Best [1] https://github.com/rapidsai/cudf/pull/9686 -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>