Andy Grove <andygrov...@gmail.com> writes: > > Looking at this purely from the DataFusion/Ballista point of view, what I > would be interested in would be having a branch of DF that uses arrow2 and > once that branch has all tests passing and can run queries with performance > that is at least as good as the original arrow crate, then cut over. > > However, for developers using the arrow APIs directly, I don't see an easy > path. We either try and gradually PR the changes in (which seems really > hard given that there are significant changes to APIs and internal data > structures) or we port some portion of the existing tests over to arrow2 > and then make that the official crate once all test pass.
How feasible would it be to make a legacy module in arrow2 that would enable (some large subset of) existing arrow users to try arrow2 after adjusting their use statements? (That is, implement the public-facing legacy interfaces in terms of arrow2's new, safe interface.) This would make it easier to test with DataFusion/Ballista and external users of the current arrow crate, then cut over and let those packages update incrementally from legacy to modern arrow2. I think it would be okay to tolerate some performance degradation when working through these legacy interfaces,so long as there was confidence that modernizing the callers would recover the performance (as tests have been showing).