Hi all,

I'd like to discuss a packaging change for arrow.

AFAIU, there are two broad categories of frameworks that use Arrow.
1. Projects that only use Arrow core (ex: cudf, ray) - where they follow
the Arrow format, but internally they are using their own Arrow impl. So,
they mostly need to read/ write from the Arrow core public APIs to convert
to/from their internal impl.
2. Projects that use arrow intimately (ex: cylon) - where they use Arrow
sub-components intimately (ex: compute, flight, etc). These may also
depend/ support Type1 projects as well (ex: GCylon for with cudf)

Now, as a member of the latter category, a major challenge we face is
managing dependencies. We currently depend on Arrow v5 and cudf 21.10 but
can not upgrade to v6 because Cudf is yet to upgrade the Arrow
dependencies. But when we look at the version upgrade PR [1], there's
hardly any API changes.

So, I would like to see if it is possible to separate out Arrow format and
core from the other subcomponents such as flight, compute, datasets, etc so
that outside projects can have independent dependencies to these
components?
AFAIU Arrow format and core API is more-or-less stable, while sub
components like flight, compute, datasets, etc have major API changes. So,
projects like cudf does not have to upgrade their arrow-core dependency as
often, while the others can enjoy new features of the subcomponents.
Ultimately we'd see dependencies as follows.

libarrow_core.so.100 (independant)
libarrow.so.600 (other subcomponents) <- libarrow_core.so.100

libcudf.so.2110 <- libarrow_core.so.100

libcylon.so.050 <- libarrow_core.so.100, libarrow.so.600, libcudf.so.2110

I understand this may require a lot of changes in the release process. But
I just wanted to float the idea to the community and see if this is doable.

Best
[1] https://github.com/rapidsai/cudf/pull/9686

-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to