Hi!

I've been using arrow/arrow-rs for a while now, my use case is to parse
Arrow streaming files and convert them into CSV.

Rust has been an absolute fantastic tool for this, the performance is
outstanding and I have had no issues using it for my use case.

I would be happy to test out the branch and let you know what the
performance is like, as I was going to improve the current implementation
that i have for the CSV writer, as it takes a while for bigger datasets
(multi-GB).

Josh


On Thu, 27 May 2021 at 22:49, Jed Brown <j...@jedbrown.org> wrote:

> Andy Grove <andygrov...@gmail.com> writes:
> >
> > Looking at this purely from the DataFusion/Ballista point of view, what I
> > would be interested in would be having a branch of DF that uses arrow2
> and
> > once that branch has all tests passing and can run queries with
> performance
> > that is at least as good as the original arrow crate, then cut over.
> >
> > However, for developers using the arrow APIs directly, I don't see an
> easy
> > path. We either try and gradually PR the changes in (which seems really
> > hard given that there are significant changes to APIs and internal data
> > structures) or we port some portion of the existing tests over to arrow2
> > and then make that the official crate once all test pass.
>
> How feasible would it be to make a legacy module in arrow2 that would
> enable (some large subset of) existing arrow users to try arrow2 after
> adjusting their use statements? (That is, implement the public-facing
> legacy interfaces in terms of arrow2's new, safe interface.) This would
> make it easier to test with DataFusion/Ballista and external users of the
> current arrow crate, then cut over and let those packages update
> incrementally from legacy to modern arrow2.
>
> I think it would be okay to tolerate some performance degradation when
> working through these legacy interfaces,so long as there was confidence
> that modernizing the callers would recover the performance (as tests have
> been showing).
>

Reply via email to